ESR10 - Wiktor Mucha | visuAAL-itn

Research project

Behaviour modelling and lifelogging

About the project

Lifelogging is an emerging ICT technology that uses wearable sensors (e.g., cameras, trackers, wearable sensors) to capture, store, process, and retrieve various situations, states, and contexts of an individual in daily life. A wearable camera that captures actions from an egocentric perspective in the form of video or a stream of images can automatically provide detailed insights into the activities the person wearing the camera has performed—such as how (s)he eats, what places (s)he visits, with whom (s)he interacts, what events (s)he attends, and more. The goal of this thesis was to create personalised tools and services to monitor, store, and process behavioural patterns, nutrition habits, social environments, contexts, and physical activities, while bringing the technology closer to the end user to support and improve their lifestyle and health.

Start date: April 2021

Expected end date: Summer 2025

Progress of the project

Under this project, state-of-the-art research on egocentric vision and lifelogging from a computer vision perspective was initially conducted, focusing on segmentation, action recognition, food-related scenes, and social interaction tracking. A comprehensive database of available datasets in the field was also created. This initial research identified several open challenges that have received limited attention from the scientific community, leading to research exploring the potential of egocentric video-based lifelogging systems for tracking actions and behaviours that impact health and well-being. Additionally, the feasibility of processing egocentric images to assist with health-related tasks, such as rehabilitation and detecting struggles to provide timely assistance, was investigated. Building on these questions, significant contributions have been made through the development of novel approaches for 2D hand pose estimation, achieving superior accuracy on public benchmarks. A 3D hand pose estimation method was also introduced, enhancing results using pseudo-depth data derived from RGB images. Further improvements were made through a camera-agnostic approach for zero-shot 3D hand pose estimation, significantly reducing mean pose error across unseen domains. These advancements contribute to bridging the gap between laboratory conditions and real-world scenarios, improving the generalisability and reliability of experimental results. Further investigations were conducted to analyse the usability of 2D hand and object poses for egocentric action recognition tasks and to evaluate performance variations when different types of pose input are used. These theoretical advancements have been successfully translated into practical and novel AAL applications, including an intelligent reading assistant for visually impaired users that integrates smart glasses with LLMs and the study for stroke patients' rehabilitation, which establishes benchmarks for exercise recognition, form evaluation, and repetition counting. This research demonstrates how egocentric vision can meaningfully support individuals in need by bridging the gap between advanced computer vision techniques and real-world assistive technologies.

Scientific publications

SHARP: Segmentation of Hands and Arms by Range Using Pseudo-depth for Enhanced Egocentric 3D Hand Pose Estimation and Action Recognition

Wiktor Mucha, Michael Wray, Martin Kampel

SHARP: Segmentation of Hands and Arms by Range Using Pseudo-depth for Enhanced Egocentric 3D Hand Pose Estimation and Action Recognition

In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15315. Springer, Cham.

REST-HANDS: Rehabilitation with Egocentric Vision Using Smartglasses for Treatment of Hands after Surviving Stroke

Wiktor Mucha, Kentaro Tanaka, Martin Kampel

REST-HANDS: Rehabilitation with Egocentric Vision Using Smartglasses for Treatment of Hands after Surviving Stroke

In 12th International Workshop on Assistive Computer Vision and Robotics (ACVR2024), 2024

TEXT2TASTE: A Versatile Egocentric Vision System for Intelligent Reading Assistance Using Large Language Model

Wiktor Mucha, Florin Cuconasu, Naome A. Etori, Valia Kalokyri, Giovanni Trappolini

TEXT2TASTE: A Versatile Egocentric Vision System for Intelligent Reading Assistance Using Large Language Model

In: Miesenberger, K., Peňáz, P., Kobayashi, M. (eds) Computers Helping People with Special Needs. ICCHP 2024. Lecture Notes in Computer Science, vol 14751. Springer, Cham., 2024

Understanding Human Behaviour With Wearable Cameras Based on Information From the Human Hand

Wiktor Mucha, Martin Kampel

Understanding Human Behaviour With Wearable Cameras Based on Information From the Human Hand

Proceedings of the Joint VisuAAL-GoodBrother Conference on Trustworthy Video- and Audio-based Assistive Technologies, 10-13, 2024

In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition

Wiktor Mucha, Martin Kampel

In My Perspective, in My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition

In Proceedings of the 8th IEEE International Conference on Automatic Face and Gesture Recognition (FG), Istanbul, Turkiye, pp.1-9, IEEE, 2024

Hands, Objects, Action! Egocentric 2D Hand-Based Action Recognition

Wiktor Mucha, Martin Kampel

Hands, Objects, Action! Egocentric 2D Hand-Based Action Recognition

In: Christensen, H.I., Corke, P., Detry, R., Weibel, JB., Vincze, M. (eds) Computer Vision Systems. ICVS 2023. Lecture Notes in Computer Science, vol 14253. Springer, Cham

Ego2DHands: Generating 2D Hand Skeleton in Egocentric Vision

Wiktor Mucha, Martin Kampel

Ego2DHands: Generating 2D Hand Skeleton in Egocentric Vision

26th Computer Vision Winter Workshop (CVWW) 2023, Krems an der Donau, Austria, 2023

Beyond Privacy of Depth Sensors in Active and Assisted Living Devices

Wiktor Mucha, Martin Kampel

Beyond Privacy of Depth Sensors in Active and Assisted Living Devices

In Proceedings of the 15th International Conference on PErvasive Technologies Related to Assistive Environments, pp. 425-429, 2022.

Addressing Privacy Concerns in Depth Sensors

Wiktor Mucha, Martin Kampel

Addressing Privacy Concerns in Depth Sensors

In International Conference on Computers Helping People with Special Needs, pp. 526-533, Springer, Cham, 2022.

TriModal Face Detection Dataset

Wiktor Mucha, Martin Kampel

TriModal Face Detection Dataset

Depth and Thermal Images in Face Detection - A Detailed Comparison Between Image Modalities

Wiktor Mucha, Martin Kampel

Depth and Thermal Images in Face Detection - A Detailed Comparison Between Image Modalities

In 2022 the 5th International Conference on Machine Vision and Applications (ICMVA), pp. 16-21, 2022.

State of the Art of Audio- and Video-Based Solutions for AAL

Aleksic, Slavisa; Atanasov, Michael; Calleja Agius, Jean; Camilleri, Kenneth; Čartolovni, Anto; Climent-Pérez, Pau; Colantonio, Sara; Cristina, Stefania; Despotovic, Vladimir; Ekenel, Hazım Kemal; Erakin, Ekrem; Florez-Revuelta, Francisco; Germanese, Danila; Grech, Nicole; Sigurðardóttir, Steinunn Gróa; Emirzeoğlu, Murat; Iliev, Ivo; Jovanovic, Mladjan; Kampel, Martin; Kearns, William; Klimczuk, Andrzej; Lambrinos, Lambros; Lumetzberger, Jennifer; Mucha, Wiktor; Noiret, Sophie; Pajalic, Zada; Pérez, Rodrigo Rodriguez; Petrova, Galidiya; Petrovica, Sintija; Pocta, Peter; Poli, Angelica; Pudane, Mara; Spinsante, Susanna; Ali Salah, Albert; Santofimia, Maria Jose; Islind, Anna Sigríður; Stoicu-Tivadar, Lacramioara; Tellioğlu, Hilda; Zgank, Andrej

State of the Art of Audio- and Video-Based Solutions for AAL

GoodBrother COST Action, Technical Report, 2022

About the ESR

Wiktor received BSc title in 2018 in Automatic Control and Robotics and MSc title in Robotics in the end of 2019, both at the AGH University of Science and Technology in Krakow, Poland. During his masters he spent one year at the University of Aveiro in Portugal as an exchange student.

Before position in visuAAL he gained experience in software engineering, working for automotive industry on autonomous embedded solutions for car driving.

Contact information

Wiktor Mucha

Vienna University of Technology

Computer Vision Lab
Favoritenstr. 9/193-1
A-1040 Vienna, Austria

Email address: wmucha@cvl.tuwien.ac.at

ESR10 - Wiktor Mucha

Partners