ESR10 - Wiktor Mucha

ESR photo
Wiktor Mucha
Research project
Behaviour modelling and lifelogging
About the project

Lifelogging is an emerging ICT technology that uses wearable sensors (e.g., cameras, trackers, wearable sensors) to capture, store, process, and retrieve various situations, states, and contexts of an individual in daily life. A wearable camera that captures actions from an egocentric perspective in the form of video or a stream of images can automatically provide detailed insights into the activities the person wearing the camera has performed—such as how (s)he eats, what places (s)he visits, with whom (s)he interacts, what events (s)he attends, and more. The goal of this thesis was to create personalised tools and services to monitor, store, and process behavioural patterns, nutrition habits, social environments, contexts, and physical activities, while bringing the technology closer to the end user to support and improve their lifestyle and health.

Start date: April 2021

Expected end date: Summer 2025

Progress of the project

Under this project, state-of-the-art research on egocentric vision and lifelogging from a computer vision perspective was initially conducted, focusing on segmentation, action recognition, food-related scenes, and social interaction tracking. A comprehensive database of available datasets in the field was also created. This initial research identified several open challenges that have received limited attention from the scientific community, leading to research exploring the potential of egocentric video-based lifelogging systems for tracking actions and behaviours that impact health and well-being. Additionally, the feasibility of processing egocentric images to assist with health-related tasks, such as rehabilitation and detecting struggles to provide timely assistance, was investigated. Building on these questions, significant contributions have been made through the development of novel approaches for 2D hand pose estimation, achieving superior accuracy on public benchmarks. A 3D hand pose estimation method was also introduced, enhancing results using pseudo-depth data derived from RGB images. Further improvements were made through a camera-agnostic approach for zero-shot 3D hand pose estimation, significantly reducing mean pose error across unseen domains. These advancements contribute to bridging the gap between laboratory conditions and real-world scenarios, improving the generalisability and reliability of experimental results. Further investigations were conducted to analyse the usability of 2D hand and object poses for egocentric action recognition tasks and to evaluate performance variations when different types of pose input are used. These theoretical advancements have been successfully translated into practical and novel AAL applications, including an intelligent reading assistant for visually impaired users that integrates smart glasses with LLMs and the study for stroke patients' rehabilitation, which establishes benchmarks for exercise recognition, form evaluation, and repetition counting. This research demonstrates how egocentric vision can meaningfully support individuals in need by bridging the gap between advanced computer vision techniques and real-world assistive technologies.

Scientific publications

SHARP: Segmentation of Hands and Arms by Range Using Pseudo-depth for Enhanced Egocentric 3D Hand Pose Estimation and Action Recognition

Wiktor Mucha, Michael Wray, Martin Kampel

In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15315. Springer, Cham.

State of the Art of Audio- and Video-Based Solutions for AAL

Aleksic, Slavisa; Atanasov, Michael; Calleja Agius, Jean; Camilleri, Kenneth; Čartolovni, Anto; Climent-Pérez, Pau; Colantonio, Sara; Cristina, Stefania; Despotovic, Vladimir; Ekenel, Hazım Kemal; Erakin, Ekrem; Florez-Revuelta, Francisco; Germanese, Danila; Grech, Nicole; Sigurðardóttir, Steinunn Gróa; Emirzeoğlu, Murat; Iliev, Ivo; Jovanovic, Mladjan; Kampel, Martin; Kearns, William; Klimczuk, Andrzej; Lambrinos, Lambros; Lumetzberger, Jennifer; Mucha, Wiktor; Noiret, Sophie; Pajalic, Zada; Pérez, Rodrigo Rodriguez; Petrova, Galidiya; Petrovica, Sintija; Pocta, Peter; Poli, Angelica; Pudane, Mara; Spinsante, Susanna; Ali Salah, Albert; Santofimia, Maria Jose; Islind, Anna Sigríður; Stoicu-Tivadar, Lacramioara; Tellioğlu, Hilda; Zgank, Andrej

GoodBrother COST Action, Technical Report, 2022

About the ESR

Wiktor received BSc title in 2018 in Automatic Control and Robotics and MSc title in Robotics in the end of 2019, both at the AGH University of Science and Technology in Krakow, Poland. During his masters he spent one year at the University of Aveiro in Portugal as an exchange student.

Before position in visuAAL he gained experience in software engineering, working for automotive industry on autonomous embedded solutions for car driving.

Contact information

Wiktor Mucha

Vienna University of Technology

Computer Vision Lab
Favoritenstr. 9/193-1
A-1040 Vienna, Austria

Email address: wmucha@cvl.tuwien.ac.at