ipylot

Intelligent co-pilot

a vision-based in-car driver assistance system designed to improve driving effectiveness with real-time feedback on alertness

1. Introduction

Ipylot (ipylot.com) is a vision-based in-car driver assistance system that is designed to improve driving effectiveness, initially focused on providing real-time feedback on alertness. In the future, ipylot’s capabilities will be extended to include assistive capabilities and personalization based on driver and passenger emotion detection, driving habits, history, real-time traffic, and preferences.

1.1 Why is it important?

Distracted driving via cell phone use, is a major cause of accidents, causing 1.6 M crashes per year out of 6 M yearly accidents in US (reference)
Nearly 390,000 injuries occur each year from accidents caused by texting while driving
Reaching for something in backseat, operating other electronics etc. can cause accidents also
Drowsy driving accounts for 100,000 crashes per year, and 1,550 fatalities
While there is a lot of potential to cater to the untapped market of improving passenger convenience through vision inference, our initial focus is on safety

2. Objective

Our team's project is a prototype for ipylot and the goal is to develop and deploy Deep Learning models in the car to detect and alert drowsy and distracted drivers

The scope includes:

classifying and alerting on common distractions including texting, using cell phones, etc.
classify and alert if a car driver is drowsy
(future possibility) automate personalization of in-car music, mood lighting, seat/steering positions, etc., based on facial recognition to identify driver and passengers
(future possibility) alert cabin conditions upon identifying pets, children, etc. (for example leaving pet or children in car)

3. System Architecture

The key architectural components of the system are:

GPU powered Edge devices (Nvidia Jetson Nano and NX) with:
- Yolo5 and CNN-LSTM models for inference and classification deployed using Docker
- an MQTT message broker deployed on Kubernetes for device to cloud communication
- Bluetooth speakers for playing alert sounds and a webcam for video capture
Cloud components:
- AWS IoT for device management, security, device event acquisition, and data processing rules
- AWS DynamoDB for storing event data
- An MQTT cloud broker deployed using docker on EC2 for collecting images and a python process to write images to S3
- S3 object store for images

4. Hardware for Development

The hardware used for development varies among the team. An example of a typical hardware setup is below:

Hardware:

NVidia Jetson Nano 4GB
PAU06 Panda Wireless 300Mbps USB wifi adapter
Logitech HD Webcam C310
Edimax 2-in-1 Wi-Fi and Bluetooth
Samsung T5 portable SSD 1TB USB 3.1
SanDisk Extreme Pro SDXC 128GB adapter
Custom 3d printed case and Noctua 5V 40x20mm fan

The picture shows an example edge hardware configuration for development. An actual product would be most likely one unit.

5. Dataset and Preparation

State Farm Full Dataset (Kaggle)

- - - Annotated a total of 8,000 distracted driver images (1,000 images per class) using Roboflow
    - Generated 6,500 images for training and 1,500 images for validation
    - Trained a Yolo-v5 model on 6,500 annotated images for 150 epochs
    - Saved best weights for inference/detection

6. Yolo v5 used for image classification from video stream

Examples of classes used for detecting drowsiness shown below.

7. Realtime inference and classification

Two timers to detect drowsy and distracted
1 sec detection period is used to decide whether the driver is drowsy or distracted
It takes about 10 ms for Yolo to process a frame and return the class. 200 frame classification outputs are processed before making a decision
Some leeway is allowed since not each frame may result in a detection

8. Challenges

Results were very good with Yolo v5 in recognizing images. In practice with video, results were less impressive. This is due to the training set being too small. A commercial product will need a much larger training set with various cars and various angles for camera placement, in various lighting conditions.
Software side was the focus, hardware will need to be ironed out in a commercial product.

9. Next steps

Explore temporal models to detect drowsiness to be run in parallel to Yolov5 for distracted detection, and compare to detecting drowsiness and detection in a single Yolov5 model. This may do better than the "1 second" model referenced in section 7 above.
Add and annotate more drowsiness images and possibly empty seat images
Do more testing “in the field” by using edge device in the car, and save sample videos from test for re-use.

10. Demo (recording)

Page updated

Google Sites

Report abuse