Vision4Blind

Highlights

  • Vision-Language Models
  • 2D/3D Scene Understanding
  • Object and Material Recognition
  • Walkable Path Suggestion and Obstacle Avoidance
  • Social Interaction Support
  • Wearable Smart Devices (Vision Glasses, AR/VR Headsets, Smart Belts)
  • Acoustic and Haptic Interfaces
Selected Publications
Author Title Source

Xin Jiang, Junwei Zheng, Ruiping Liu, Jiahang Li, Jiaming Zhang, Sven Matthiesen, Rainer Stiefelhagen

IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025, PDF

M. Martinez, K. Yang, A. Constantinescu, R. Stiefelhagen

Sensors, September 2020, pdf

H. Chen, Y. Zhang, K. Yang, M. Martinez, K. Müller, R. Stiefelhagen

In International Conference on Computers Helping People with Special Needs (ICCHP), Online, September 2020, pdf

W. Hu, K. Wang, K. Yang, R. Cheng, Y. Ye, L. Sun, Z. Xu

Sensors, June 2020, pdf

M. Martinez, A. Roitberg, D. Koester, B. Schauerte, R. Stiefelhagen

ICCV Workshop on Assistive Computer Vision and Robotics (ACVR), Venice, Italy, October 2017, pdf

T. Wörtwein, B. Schauerte, K. Mueller, R. Stiefelhagen

International Conference on Multimodal Interaction (ICMI), Seattle, Washington, USA, November, 2015

B. Schauerte, T. Wörtwein, R. Stiefelhagen

Accessible Interaction for Visually Impaired People (AI4VIP), Stuttgart, Germany, September, 2015

B. Schauerte, D. Koester, M. Martinez, R. Stiefelhagen

ECCV Workshop on Assistive Computer Vision and Robotics (ACVR), Zurich, Switzerland, September, 2014

D. Koester, B. Schauerte, R. Stiefelhagen

 IEEE Workshop on Multimodal and Alternative Perception for Visually Impaired People (MAP4VIP ) In Conjunction with ICME 2013  (pdf |bib )

B. Schauerte, M. Martinez, A. Constantinescu, R. Stiefelhagen

International Conference on Computers Helping People with Special Needs (ICCHP), Linz, Austria, July, 2012

J. Zheng, J. Zhang, K. Yang, K. Peng, R. Stiefelhagen

2024 IEEE International Conference on Robotics and Automation (ICRA), DOI.

R. Liu, J. Zhang, K. Peng, J. Zheng, K. Cao, Y. Chen, K. Yang, R. Stiefelhagen

International Workshop on Assistive Computer Vision and Robotics (ACVR) with IEEE/CVF International Conference on Computer Vision (ICCV), 2023. DOI.

Vision-Language Models for Assistitve Technology

ATBenchcvhci

ATBench is a framework designed to benchmark Vision-Language Models (VLMs) specifically for applications in human-centered assistive technology. The work discusses how these models can significantly enhance the capabilities and usability of assistive technologies, particularly for persons with visual impairments (PVIs). By integrating multi-modal information, ATBench aims to provide comprehensive assistance tailored to the needs of users.

  • ATBench emphasizes the importance of combining visual and language data to enhance the performance of assistive technologies. This integration is essential for offering users a more intuitive and effective interaction with their environments, allowing models to understand context better and generate more relevant responses.
  • The framework introduces a series of new metrics specifically designed to evaluate VLMs in the context of assistive technology. These metrics not only assess the accuracy of responses but also consider the usability and user satisfaction aspects, which are crucial for real-world applications in assisting users.
  • ATBench aims to bridge the gap between technical performance and practical usability by testing models on tasks that closely mimic the everyday challenges faced by PVIs. This focus ensures that the developed VLMs are not only theoretically sound but are also practically applicable in assisting real users, thereby enhancing their daily lives.

Scene Understanding

People with Visual Impairments (PVI) often have difficulty interpreting their surroundings correctly due to the lack of visual cues. Understanding a scene through tactile exploration proves insufficient and potentially hazardous. Often people need to recognize the whole scene at first glance, then gaze at each object, sort out their relationships, and react to the scene.

We explore scene understanding for helping PVI. In this work, we design an Open Scene Understanding (OpenSU) system which consists of Grounded Situation Recognition (GSR) and Segment Anything (SAM), allowing PVI to perceive the entirety of the scene and retrieve object information in a specified direction.

Grounded situation recognition captures the activity (e.g., sitting), nouns (e.g., woman, chair, office) related to the roles (Agent, Item, Place), and the bounding boxes of the objects. The caption template is “An Agent sits on an Item at a Place”, so the image caption of the verb sitting is “A woman sits on a chair at an office”. SAM uses bounding boxes as prompts to generate the segmentation masks. According to application interaction, potential region indication methods (e.g., fingertip, head pose, laser pointer) can be used to specify a region of interest, and the information can be reported to the user via bone-conducting earphones of wearable systems.

opensu cvhci
Open scene understanding

Object and Material Recognition

MateRobot cvhci
MateRobot

MateRobot is a wearable robotic system aimed at assisting people with visual impairments (PVI) by providing them with the ability to recognize various materials through visual cues. This technology represents a significant advancement in improving the autonomy and mobility of individuals with visual impairments, enabling them to interact more effectively with their environment.

The research demonstrated that it is feasible for this system to accurately identify material properties using advanced machine learning techniques. The ultimate goal is to enhance the functionality of wearable robotics by integrating material recognition capabilities, thereby enriching the sensory experiences of users.

  • MateRobot utilizes a vision-based approach to identify different materials, which is essential for helping users understand their surroundings and make informed decisions.
  • The system is designed to be user-friendly for PVI individuals, focusing on improving their daily life by providing real-time feedback about the materials they encounter.
  • The research employs state-of-the-art image segmentation algorithms that increase the accuracy of material recognition.

Navigation and Obstacle Avoidance

eSash

We develop a mobility and navigational aid system for visually impaired persons, i.e. blind persons or persons with low vision. The system should enable visually impaired persons to navigate safely in an unknown terrain. It uses computer vision methods to detect landmarks, obstacles and free ground surface in front of the user. Acoustic as well as haptic interfaces (e.g. vibrating elements) are used to give information to the user. We have already created a prototype to this end and develop the system in a user-centered way, which is made possible by our tight cooperation with KIT Center for Digital Accessibility and Assistive Technolgoy (ACCESS∂KIT).

In the long term, we wish to allow visually impaired people to move around and to find their way safely on their own in a new environment, such as an unknown city or any other urban or rural environment. In the short term, our goal is to build a mobile system that helps visually impaired students to safely explore our university campus.

Our goal is to build a mobile assistive system to support the mobility of visually impaired people in various ways:

  • provide orientation and navigation information (like GPS navigation systems)
  • warn user before obstacles, both on the ground as well as high/low hanging obstacles
  • plan a route beforehand, i.e., before a user can reach it with the white cane, with respect to the current situation, e.g., detect accessible section and obstacles along the path
  • provide additional information about "the scene", e.g., type of intersection, walkway, people or cyclists approaching or oncoming traffic
  • create situation specific "modules", e.g., crossing an inaccessible road intersection (button location, traffic light detection, anti-veering), help with zebra crossings, locating building entries and many others.