Evaluation of AI-powered
Sensing Systems

I worked on building and evaluating the effectiveness and usability of several AI-powered sensing systems during my PhD in Human-Computer Interaction.


Eyes on the Road: Detecting Phone Usage by
Drivers Using On-Device Cameras


Company: Carnegie Mellon University

Abstract:

In this work, we evaluated a lightweight, software-only solution that uses the phone's camera to observe the car's interior geometry to distinguish phone position and orientation which is then used to distinguish between driver and passenger phone use. We collected data in 16 different cars with 33 different users and achieved an overall accuracy of 94% when the phone is held in hand and 92.2% when the phone is docked (<=1sec. delay). With just a software upgrade, this work can enable smartphones to proactively adapt to the user's context in the car and and substantially reduce distracted driving incidents.


Output:

  1. Published article at ACM CHI

Figure 1. Lines detected in the photo captured by the phone when docked on the windshield at (a) passenger's right; (b) driver's left; and (c) driver's right side. The lines capture the perspective of the geometry of objects inside a car from different viewpoints.

methods
DATA COLLECTION

In our data collection procedure, we have two variables:

  1. Placement of the Phone: docked-on-shield, docked-on-vent, held-in-hand

  2. Camera Used: back-camera, front-camera

In all conditions, the video was recorded at 30 frames per second with a resolution of 720p. The field of view of the camera is approximately 75 degrees.

DOCKED PHONE

For the two docked conditions, we collected the data in 10 different cars. We placed the phone in 6 different positions in the car, 3 each on the shield [Phone 1-3] and the vent [Phone 4-6] as shown in Figure 2.

When the phones were docked, the users did not need to interact with the phones. Thus, we did not recruit external participants for this part of the study. The members of the research team drove the cars in an urban area to collect the data. We chose this approach primarily because of the safety concerns around recording videos in a moving car. We recorded videos (avg. length=3.5mins.) from both the front and the back camera.

PHONE IN HAND

When the phone is held in the hand, apart from measuring the performance in different cars, we wanted to cover different user behaviors, postures, and approaches to holding the phone while driving. Thus, we recruited 33 participants (16 male, 17 female, mean age = 26.04) and recorded data in 16 different cars. To ensure the safety of our participants, we conducted the study in a stationary car and simulated the in-hand conditions as shown in Figure 2 [phone 7-8]. We chose to conduct the study in a stationary car instead of a driving simulator to capture signals in a real setting and to capture visuals of real cars.

When on the driver seat, the participants were asked to pretend as if they were driving and using the phone at the same time. They were encouraged to behave as they usually would while driving (eyes on the road, hands on the wheel etc. Similarly, when the participants performed the task as a passenger, they were encouraged to behave/type as they would if they were passengers in a moving car. We did not control their phone usage behavior. The participants were allowed to move the phone or place the phone anywhere they desired. In fact some of them did place it in their lap, or the center console. This freedom allows us to capture more realistic data of phone usage in the car, instead of relying on predetermined positions chosen by us. The phone orientation was also not controlled, but all participants used the device in portrait mode while driving.

The participants completed two everyday tasks on their phone: (1) responding to text messages; and (2) changing music. These are the two most common tasks a person performs in their car that require continuous interaction. So, we used them as our study tasks to capture realistic scenarios. Both tasks were performed once as the driver and once as the passenger by the same person in their car. For the duration of the study, we recorded videos (avg length = 2.5 mins) from both the front and the back camera. These videos were recorded using an off-the-shelf app that allows the phone to capture video while running in the background. This approach allowed the users to focus on their task and not get distracted by the video recording.


We evaluated the efficacy of the solution across users as well as cars. This ensures that the product was usable across a wide array of user behaviors and generalizable across different car make/models.

GymCam: Detecting, Recognizing and Tracking
Simultaneous Exercises in Unconstrained Scenes


Company: Carnegie Mellon University

Abstract:

GymCam is a camera-based system for automatically detecting, recognizing and tracking multiple people and exercises simultaneously in unconstrained environments without any user intervention. We collected data in a varsity gym, correctly segmenting exercises from other activities with an accuracy of 84.6%, recognizing the type of exercise at 93.6% accuracy, and counting the number of repetitions to within ± 1.7 on average. GymCam advances the field of real-time exercise tracking by filling some crucial gaps, such as tracking whole body motion, handling occlusion, and enabling single-point sensing for a multitude of users.


Output:

  1. Published article at ACM IMWUT

Figure 1. GymCam uses a camera to track exercises. (Top) Optical flow tracking motion trajectories of various points in the gym. Green showcases points classified as exercises and red showcases non-exercise points. (Bottom Left) Individual exercise points are clustered based on similarity to combine points belonging to the same exercise. (Bottom Right) For each exercise (cluster). GymCam infers the type of exercise and calculates the repetition count.

methods
DATA COLLECTION

We collected data in the Carnegie Mellon University’s varsity gym over a five-day period. To ensure a wide, unobstructed view, we placed one camera on a wall at a height of approximately 4 meters. This placement was also inconspicuous, aiming to minimize observer effects (e.g., users altering their warm-up or stretching routine, lifting usual weights). The university’s Institutional Review Board and Department of Athletics officials agreed that as long as videos were immediately anonymized, we did not need signed consent from participants. Nonetheless, gym users were informed that a research team was recording anonymized videos and any questions, comments or objections should be raised to the gym staff (though none did). Thus, gym users were given no instructions regarding exercises, repetitions, breaks, etc., and is as close to unconstrained data collection as practically possible.

FitByte: Automatic Diet Monitoring in Unconstrained Situations
Using Multimodal Sensing on Eyeglasses


Company: Carnegie Mellon University

Abstract:

FitByte is a multi-modal sensing approach on a pair of eyeglasses that tracks all phases of food intake. FitByte contains a set of inertial and optical sensors that allow it to reliably detect food intake events in noisy environments. It also has an on-board camera that opportunistically captures visuals of the food as the user consumes it. We evaluated the system in two studies with decreasing environmental constraints with 23 participants. On average, FitByte achieved 89% F1-score in detecting eating and drinking episodes.


Output:

  1. Published article at ACM CHI

methods
DATA COLLECTION

We conducted the data collection and evaluation of FitByte in two separate studies. In the first study, we assessed the ecological validity of FitByte by testing the developed models on a completely unseen 91 hours dataset collected in the unconstrained free-living environment. We also did a preliminary investigation on the perceived privacy and social acceptability aspect of the system.

STUDY PROCEDURE

In this study, we aim to evaluate the performance of FitByte for an extended period of time in the real world without any constrains on the participant’s behavior. We asked participants to wear FitByte continuously for 12 hours a day for as many days they can. Due to the small battery, the onboard camera can only record videos for a limited duration. Thus, for ground truth, we used an external camera similar to the onboard one and attached it to the participant’s shirt. At the end of the study, we asked the participants about their perception of social acceptability and privacy implications of the device in a semi-structured interview. We recruited 5 participants (1 female), age between 21-30 years, all university students. Three participants wore the device for two days and two for one day.

Participants started the study at different times in the morning (between 8 am and 11 am) and took it off 8 or 12 hours later. The dataset contains a very diverse set of activities across different participants, which included cooking, driving, working in a chemical lab, working in an office, laying down, taking public transports, grocery shopping, exercising in a gym and many more.