UX RESEArCH for A.I. Systems (MULTIPLE PROJECTS)

During my PhD in Human-Computer Interaction, I developed and evaluated several AI-powered systems, focusing on optimizing their effectiveness and usability. These projects explored innovative applications of AI in diverse contexts, from enhancing user interactions to improving sensing capabilities, novel hardware and data interpretation. The following projects showcase my work at the intersection of AI and UX.

Eyes on the Road: Detecting Phone Usage by
Drivers Using On-Device Cameras

PROBLEM:

Distracted driving caused by smartphone use is a critical public safety issue, increasing the likelihood of accidents by 400%. Despite widespread awareness of the dangers, drivers continue to interact with their phones for activities like texting, navigation, and media control, leading to thousands of preventable injuries and fatalities each year. Existing solutions, such as blocking app functionality while driving, are ineffective because they rely on user input to determine if the phone is being used by the driver or a passenger. These methods are easily bypassed and often disrupt essential functions, such as navigation, further complicating the issue. There is an urgent need for an intuitive, scalable solution that can accurately distinguish between driver and passenger phone use, enabling smartphones to intelligently manage distractions and enhance road safety.

OUTPUT:

Published article at ACM CHI (Acceptance Rate: 18.5%)
Worked with CMU Center for Technology Transfer to get the software license-ready

SOLUTION:
Our solution is a software-only approach that utilizes the smartphone’s built-in camera to differentiate between driver and passenger phone use in real-time. By analyzing the unique perspectives and geometric patterns of the car’s interior, the system can accurately identify the user’s role without requiring any additional hardware or user input. This enables the phone to automatically adapt its functionality based on the user’s context, such as limiting notifications and access to distracting apps when the driver is detected. With an accuracy rate exceeding 90%, this approach offers a practical and scalable way to mitigate distracted driving, ensuring safer roads while preserving essential functionalities like navigation and hands-free communication.

Figure 1. Lines detected in the photo captured by the phone when docked on the windshield at (a) passenger's right; (b) driver's left; and (c) driver's right side. The lines capture the perspective of the geometry of objects inside a car from different viewpoints.

EVALUATION METHODS

Evaluating the unique value proposition of this AI-powered software required a meticulously designed study to ensure both safety and ecological validity. Given the potential risks associated with simulating driving scenarios, it was essential to create a controlled environment that accurately reflected real-world conditions without compromising participant safety. The study was carefully structured to capture the system’s capabilities while minimizing any potential harm. The procedure is outlined below.

STUDY DESIGN:

In our data collection procedure, we have two variables:

Placement of the Phone: docked-on-shield, docked-on-vent, held-in-hand
Camera Used: back-camera, front-camera

In all conditions, the video was recorded at 30 frames per second with a resolution of 720p. The field of view of the camera is approximately 75 degrees.

DOCKED PHONE

For the two docked conditions, we collected the data in 10 different cars. We placed the phone in 6 different positions in the car, 3 each on the shield [Phone 1-3] and the vent [Phone 4-6] as shown in Figure 2.

When the phones were docked, the users did not need to interact with the phones. Thus, we did not recruit external participants for this part of the study. The members of the research team drove the cars in an urban area to collect the data. We chose this approach primarily because of the safety concerns around recording videos in a moving car. We recorded videos (avg. length=3.5mins.) from both the front and the back camera.

PHONE IN HAND

When the phone is held in the hand, apart from measuring the performance in different cars, we wanted to cover different user behaviors, postures, and approaches to holding the phone while driving. Thus, we recruited 33 participants (16 male, 17 female, mean age = 26.04) and recorded data in 16 different cars. To ensure the safety of our participants, we conducted the study in a stationary car and simulated the in-hand conditions as shown in Figure 2 [phone 7-8]. We chose to conduct the study in a stationary car instead of a driving simulator to capture signals in a real setting and to capture visuals of real cars.

When on the driver seat, the participants were asked to pretend as if they were driving and using the phone at the same time. They were encouraged to behave as they usually would while driving (eyes on the road, hands on the wheel etc. Similarly, when the participants performed the task as a passenger, they were encouraged to behave/type as they would if they were passengers in a moving car. We did not control their phone usage behavior. The participants were allowed to move the phone or place the phone anywhere they desired. In fact some of them did place it in their lap, or the center console. This freedom allows us to capture more realistic data of phone usage in the car, instead of relying on predetermined positions chosen by us. The phone orientation was also not controlled, but all participants used the device in portrait mode while driving.

TASKS:

The participants completed two everyday tasks on their phone: (1) responding to text messages; and (2) changing music. These are the two most common tasks a person performs in their car that require continuous interaction. So, we used them as our study tasks to capture realistic scenarios. Both tasks were performed once as the driver and once as the passenger by the same person in their car. For the duration of the study, we recorded videos (avg length = 2.5 mins) from both the front and the back camera. These videos were recorded using an off-the-shelf app that allows the phone to capture video while running in the background. This approach allowed the users to focus on their task and not get distracted by the video recording.

We evaluated the efficacy of the solution across users as well as cars. This ensures that the product was usable across a wide array of user behaviors and generalizable across different car make/models.

DATA ANALYSIS SUMMARY:
Our solution demonstrated high accuracy in differentiating between driver and passenger phone use across various testing conditions:

Overall Accuracy:

- Achieved 94% accuracy in distinguishing driver vs. passenger when the phone is held in hand.
- Achieved 92.2% accuracy when the phone is docked, with a detection delay of less than 1 second.

Testing Scenarios:

- Data was collected in 16 different cars with 33 unique participants, covering a wide range of car interiors and user behaviors.
- Tested in both docked (windshield and vent positions) and in-hand scenarios to ensure robust performance across various real-world conditions.

Robustness:

- The system proved resilient against variations in car models and user hand positions, maintaining high accuracy despite diverse conditions.
- Achieved consistent performance across different lighting conditions and car interior designs, making it highly adaptable to various environments.

Low Computational Overhead:

- The solution runs in real-time with minimal processing requirements, making it suitable for deployment on standard smartphones without additional hardware.

These results validate the effectiveness of our approach in detecting driver phone use, providing a reliable and scalable method for reducing distracted driving incidents and enhancing road safety.

A video summary of the project and the final product can be seen below:

GymCam: Detecting, Recognizing and Tracking
Simultaneous Exercises in Unconstrained Scenes

PROBLEM:

Despite the increasing popularity of fitness tracking devices, current systems face significant limitations in accurately monitoring a wide range of exercises, especially in dynamic and unconstrained environments like gyms. Wearable sensors are typically attached to a single part of the body, which restricts their ability to capture complex movements involving multiple limbs. This often leads to incomplete or inaccurate data, particularly for exercises that engage different muscle groups simultaneously. Camera-based systems, while offering a broader view of user movements, struggle with issues such as noise, occlusion, and distinguishing between similar motions performed by multiple people in close proximity.

These limitations create a gap in providing users with reliable, real-time feedback on their workouts, hindering their ability to track progress and maintain motivation. There is a pressing need for an exercise tracking system that can seamlessly and accurately monitor a diverse range of activities, account for the complexities of multi-user environments, and provide high-quality feedback without intrusive equipment or manual input. Addressing these challenges is crucial for advancing exercise tracking technologies and improving the overall fitness experience.

OUTPUT:

Published article at ACM IMWUT
Worked with CMU Center for Technology Transfer to get the software license-ready

SOLUTION:
We developed GymCam, an AI-powered vision system that revolutionizes exercise tracking by using off-the-shelf cameras to automatically detect, recognize, and track multiple people and exercises simultaneously in real-world gym environments. Unlike traditional wearable-based systems, GymCam captures full-body motion from a single vantage point, overcoming challenges like occlusion and noise. With the ability to accurately segment exercises from other activities, recognize exercise types with over 93% accuracy, and count repetitions to within ±1.7, GymCam provides unparalleled insight into users’ workouts. This innovative approach eliminates the need for cumbersome sensors and manual tracking, offering a seamless, user-friendly solution that transforms how we monitor and optimize fitness routines in complex, multi-user settings.

Figure 1. GymCam uses a camera to track exercises. (Top) Optical flow tracking motion trajectories of various points in the gym. Green showcases points classified as exercises and red showcases non-exercise points. (Bottom Left) Individual exercise points are clustered based on similarity to combine points belonging to the same exercise. (Bottom Right) For each exercise (cluster). GymCam infers the type of exercise and calculates the repetition count.

EVALUATION methods / STUDY DESIGN

To rigorously evaluate GymCam’s effectiveness in detecting, recognizing, and tracking exercises, we designed a comprehensive study conducted in an authentic gym environment. Our primary goal was to ensure ecological validity while collecting high-quality data to train and test our AI algorithms. The study was structured to capture diverse user behaviors and exercise types without disrupting participants’ natural routines.

Setting:

- The study was conducted over five days at Carnegie Mellon University’s varsity gym, capturing real-world scenarios with minimal interference.
- We placed a single, unobtrusive camera at a height of approximately 4 meters to capture a wide view of the gym floor, ensuring we could monitor multiple users simultaneously.

Participants:

- Data was collected from 597 exercise instances performed by various gym-goers, covering a range of fitness levels and exercise types.
- To ensure natural behavior, participants were informed that video recording was taking place but were given no specific instructions on their workouts.

Data Collection:

- We recorded 42 hours of video footage across different times of day, capturing a variety of exercises and gym usage patterns.
- A custom-built annotation tool was used to label the type of exercise, start and end times, and repetition counts, resulting in a robust ground truth dataset for training and validation.

Evaluation Metrics:

- We focused on three core metrics to assess GymCam’s performance:
  1. Exercise Detection Accuracy: Identifying whether a motion trajectory represented an exercise or non-exercise activity.
  2. Exercise Recognition Accuracy: Classifying the type of exercise being performed.
  3. Repetition Counting Accuracy: Calculating the number of repetitions with minimal deviation from actual counts.

RESULTS

GymCam demonstrated strong performance across all key metrics, validating its capability to accurately track exercises in a real-world gym setting:

Exercise Detection:

- Achieved a 99.6% accuracy in distinguishing exercise activities from non-exercise motions, such as fidgeting or walking.
- This high detection accuracy ensures reliable identification of relevant activities, minimizing false positives and enhancing data quality.

Exercise Recognition:

- Recognized the five most common exercise types (e.g., squats, deadlifts) with an impressive 93.6% accuracy.
- For less frequent exercises, accuracy stood at 80.6%, highlighting the system’s potential to improve with additional training data.

Repetition Counting:

- The system counted repetitions with an average error of just ±1.7 reps across various exercises, ensuring precise feedback for users.
- This level of accuracy is crucial for tracking progress and maintaining user engagement and motivation.

Scalability and Robustness:

- GymCam effectively tracked multiple users performing different exercises simultaneously, handling occlusions and complex movements without degrading performance.
- The system’s ability to operate with minimal setup and no user intervention makes it highly scalable for deployment in diverse gym environments.

These results underscore GymCam’s potential to revolutionize exercise tracking by providing a seamless, accurate, and user-friendly solution that enhances both user experience and fitness outcomes. A summary of the work and a demo of the final product can be seen below:

FitByte: Automatic Diet Monitoring in Unconstrained Situations
Using Multimodal Sensing on Eyeglasses

PROBLEM:

Accurately monitoring dietary habits is essential for understanding the relationship between diet and health, yet current methods are limited and cumbersome. Most diet tracking systems require manual logging, which is time-consuming and prone to inaccuracies as users often forget to record their meals or misjudge portion sizes. Wearable devices have been developed to automate diet monitoring, but they typically focus on a single aspect of food intake, such as chewing or swallowing. This narrow approach struggles to generalize across diverse food types and daily activities, especially in unconstrained environments like social gatherings or outdoor settings. As a result, these systems often fail to provide reliable data, making it challenging for individuals and healthcare professionals to track and manage dietary behavior effectively.

OUTPUT:

Published article at ACM CHI
Worked with CMU Center for Technology Transfer to get the software license-ready

SOLUTION:

FitByte addresses these limitations with a novel multimodal sensing system integrated into a pair of eyeglasses, designed to monitor all phases of food intake in real-world settings. Utilizing a combination of inertial sensors, proximity sensors, and a camera, FitByte can detect chewing, swallowing, and hand-to-mouth gestures with high accuracy, even in noisy and dynamic environments. The system intelligently triggers the camera to capture visuals of the food being consumed, providing users with a detailed record of their dietary habits without manual input. By combining multiple sensing modalities, FitByte offers a comprehensive and unobtrusive solution for automatic diet monitoring, enabling more accurate and actionable insights into users’ eating behaviors.

Study Design

To evaluate the effectiveness and usability of FitByte, we designed a comprehensive study that captures dietary behaviors in both controlled and real-world, unconstrained environments. Our study was structured to ensure ecological validity while gathering robust data to train and validate the AI models powering FitByte.

STUDY STRUCTURE:

Semi-Constrained Environment Study:
- Objective: To collect data in varied but controlled conditions that simulate real-world scenarios.
- Participants: 10 individuals (5 males, 5 females) aged 18-36, each participating in five distinct sessions.
- Activities Monitored: Lunch meetings, café visits, TV watching, exercising in a gym, and hiking.
- Data Collected:
  - Each session lasted 15-30 minutes, capturing various dietary behaviors and contextual factors such as talking, walking, and engaging in noisy activities.
- Data Annotation:
  - Two researchers annotated activities with high granularity using a 1-second resolution, classifying actions such as eating, drinking, talking, and walking.
Unconstrained Free-Living Study:
- Objective: To test FitByte’s performance in natural, daily settings without restricting participants’ behavior.
- Participants: 5 participants (aged 21-30), each wearing FitByte for up to 12 hours a day over multiple days.
- Activities Monitored:
  - Included diverse scenarios like cooking, working, commuting, and social interactions.
- Data Collected:
  - A total of 91 hours of video footage and sensor data were collected, with annotations made for all eating and drinking episodes.

Evaluation Metrics:

Frame-Level Detection: Accuracy of detecting food intake at a per-second resolution.
Intake-Level Detection: Aggregating frame-level data to identify specific eating or drinking instances.
Episode-Level Detection: Identifying and characterizing entire meals or snack sessions to estimate frequency and duration.

Qualitative UX Research:

Conducted semi-structured interviews to gather qualitative insights on FitByte’s usability, comfort, and social acceptability.
Utilized thematic analysis to identify key usability challenges, such as device comfort, visibility, and potential stigma of wearing the device in public.
Designed iterative prototypes based on user feedback to optimize sensor placement, improve battery life, and enhance user comfort.

RESULTS:

The results from both study phases demonstrate the efficacy of FitByte in accurately detecting and monitoring dietary behaviors across a variety of settings.

Semi-Constrained Environment Study:

Eating Detection:
- Achieved 83.1% frame-level accuracy in detecting eating actions.
- Intake-level recall reached 93.8% with a precision of 82.8%, showing the system’s robustness in identifying individual bites and chews.
Drinking Detection:
- Detected drinking instances with 64.5% recall and 56.7% precision. Lower performance was observed for sporadic sips mixed with other activities.
Episode-Level Recognition:
- Successfully identified 32 out of 34 eating episodes, achieving 96.3% coverage with an average detection delay of only 6.5 seconds.

Unconstrained Free-Living Study:

Overall Performance:
- Detected 22 out of 28 food intake episodes, achieving 89% coverage in a free-living environment, a testament to the system’s real-world applicability.
- Achieved an 84.7% precision and 75.4% recall for intake-level detection, even under challenging conditions like outdoor activities and multitasking.
Food Type Identification:
- The system captured short video snippets during detected eating events. Volunteers were able to correctly identify the food type in 20 out of 22 episodes, highlighting FitByte’s utility in aiding users’ recall of their diet.
User Feedback:
- Participants reported high social acceptability, appreciating the eyeglasses form factor and subtle camera placement.
- Users suggested further refinements, such as slimmer arms and customizable styles, to improve comfort and integration into daily life.

IMPACT AND INSIGHTS:

The study’s design and results showcase the successful integration of UX research with AI system development, highlighting how user-centered design can inform and enhance the functionality of complex sensing systems. By combining qualitative insights with quantitative performance metrics, we demonstrated FitByte’s potential to transform dietary monitoring in real-world settings, offering a practical and socially acceptable solution for users seeking to track their diet effortlessly and accurately.

A summary of the project and the product demo can be seen below:

Google Sites

Report abuse

UX RESEArCH for A.I. Systems (MULTIPLE PROJECTS)

Eyes on the Road: Detecting Phone Usage by Drivers Using On-Device Cameras

EVALUATION METHODS

DOCKED PHONE

PHONE IN HAND

GymCam: Detecting, Recognizing and Tracking Simultaneous Exercises in Unconstrained Scenes

EVALUATION methods / STUDY DESIGN

RESULTS

FitByte: Automatic Diet Monitoring in Unconstrained Situations Using Multimodal Sensing on Eyeglasses

Study Design

RESULTS:

IMPACT AND INSIGHTS:

Eyes on the Road: Detecting Phone Usage by
Drivers Using On-Device Cameras

GymCam: Detecting, Recognizing and Tracking
Simultaneous Exercises in Unconstrained Scenes

FitByte: Automatic Diet Monitoring in Unconstrained Situations
Using Multimodal Sensing on Eyeglasses