
Lifelogging is an emerging ICT technology that uses wearable sensors (e.g., cameras, trackers, wearable sensors) to capture, store, process, and retrieve various situations, states, and contexts of an individual in daily life. A wearable camera that captures actions from an egocentric perspective in the form of video or a stream of images can automatically provide detailed insights into the activities the person wearing the camera has performed—such as how (s)he eats, what places (s)he visits, with whom (s)he interacts, what events (s)he attends, and more. The goal of this thesis was to create personalised tools and services to monitor, store, and process behavioural patterns, nutrition habits, social environments, contexts, and physical activities, while bringing the technology closer to the end user to support and improve their lifestyle and health.
Start date: April 2021
Expected end date: Summer 2025
Under this project, state-of-the-art research on egocentric vision and lifelogging from a computer vision perspective was initially conducted, focusing on segmentation, action recognition, food-related scenes, and social interaction tracking. A comprehensive database of available datasets in the field was also created. This initial research identified several open challenges that have received limited attention from the scientific community, leading to research exploring the potential of egocentric video-based lifelogging systems for tracking actions and behaviours that impact health and well-being. Additionally, the feasibility of processing egocentric images to assist with health-related tasks, such as rehabilitation and detecting struggles to provide timely assistance, was investigated. Building on these questions, significant contributions have been made through the development of novel approaches for 2D hand pose estimation, achieving superior accuracy on public benchmarks. A 3D hand pose estimation method was also introduced, enhancing results using pseudo-depth data derived from RGB images. Further improvements were made through a camera-agnostic approach for zero-shot 3D hand pose estimation, significantly reducing mean pose error across unseen domains. These advancements contribute to bridging the gap between laboratory conditions and real-world scenarios, improving the generalisability and reliability of experimental results. Further investigations were conducted to analyse the usability of 2D hand and object poses for egocentric action recognition tasks and to evaluate performance variations when different types of pose input are used. These theoretical advancements have been successfully translated into practical and novel AAL applications, including an intelligent reading assistant for visually impaired users that integrates smart glasses with LLMs and the study for stroke patients' rehabilitation, which establishes benchmarks for exercise recognition, form evaluation, and repetition counting. This research demonstrates how egocentric vision can meaningfully support individuals in need by bridging the gap between advanced computer vision techniques and real-world assistive technologies.
SHARP: Segmentation of Hands and Arms by Range Using Pseudo-depth for Enhanced Egocentric 3D Hand Pose Estimation and Action Recognition
In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15315. Springer, Cham.
REST-HANDS: Rehabilitation with Egocentric Vision Using Smartglasses for Treatment of Hands after Surviving Stroke
In 12th International Workshop on Assistive Computer Vision and Robotics (ACVR2024), 2024
TEXT2TASTE: A Versatile Egocentric Vision System for Intelligent Reading Assistance Using Large Language Model
In: Miesenberger, K., Peňáz, P., Kobayashi, M. (eds) Computers Helping People with Special Needs. ICCHP 2024. Lecture Notes in Computer Science, vol 14751. Springer, Cham., 2024
Understanding Human Behaviour With Wearable Cameras Based on Information From the Human Hand
Proceedings of the Joint VisuAAL-GoodBrother Conference on Trustworthy Video- and Audio-based Assistive Technologies, 10-13, 2024
In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition
In Proceedings of the 8th IEEE International Conference on Automatic Face and Gesture Recognition (FG), Istanbul, Turkiye, pp.1-9, IEEE, 2024
Hands, Objects, Action! Egocentric 2D Hand-Based Action Recognition
In: Christensen, H.I., Corke, P., Detry, R., Weibel, JB., Vincze, M. (eds) Computer Vision Systems. ICVS 2023. Lecture Notes in Computer Science, vol 14253. Springer, Cham
Ego2DHands: Generating 2D Hand Skeleton in Egocentric Vision
26th Computer Vision Winter Workshop (CVWW) 2023, Krems an der Donau, Austria, 2023
Beyond Privacy of Depth Sensors in Active and Assisted Living Devices
In Proceedings of the 15th International Conference on PErvasive Technologies Related to Assistive Environments, pp. 425-429, 2022.
Addressing Privacy Concerns in Depth Sensors
In International Conference on Computers Helping People with Special Needs, pp. 526-533, Springer, Cham, 2022.
Depth and Thermal Images in Face Detection - A Detailed Comparison Between Image Modalities
In 2022 the 5th International Conference on Machine Vision and Applications (ICMVA), pp. 16-21, 2022.
State of the Art of Audio- and Video-Based Solutions for AAL
GoodBrother COST Action, Technical Report, 2022
Wiktor received BSc title in 2018 in Automatic Control and Robotics and MSc title in Robotics in the end of 2019, both at the AGH University of Science and Technology in Krakow, Poland. During his masters he spent one year at the University of Aveiro in Portugal as an exchange student.
Before position in visuAAL he gained experience in software engineering, working for automotive industry on autonomous embedded solutions for car driving.
Wiktor Mucha
Vienna University of Technology
Computer Vision Lab
Favoritenstr. 9/193-1
A-1040 Vienna, Austria
Email address: wmucha@cvl.tuwien.ac.at