Lyft Perception Challenge

4 minute read


Achieve pixel-wise identification of objects in camera images. It was hosted by Udacity and Lyft’s Level 5 Engineering Center. details

github repo

alt text


The goal in this challenge is pixel-wise identification of objects in camera images. In other words, the task is to identify exactly what is in each pixel of an image! More specifically, you’ll be identifying cars and the drivable area of the road. The images below are a simulated camera image on the left and a label image on the right, where each different type of object in the image corresponds to a different color.

The challenge time is from May 1st 10:00 am PST to June 3rd at 6:00 pm PST.


I got the rank 19th finally.

The top 25 (only with U.S. work authorization) will be eligible for an interview with Lyft. So, I participated this challenge just for the fun of it.

alt text

The Data

The challenge data is being produced by the CARLA Simulator, an open source autonomous vehicle platform for the testing and derivative of autonomous algorithms. You can download the data here.

The dataset consists of images and the corresponding ground truth pixel-wise labels for each image.

The images and ground truth are both 3 channel RGB images and the labels for the ground truth are stored as integer values in the red channel of each ground truth image. The integer values for each pixel in the ground truth images correspond to which category of object appears in that pixel, according to this table:

0None[0, 0, 0]
1Buildings[70, 70, 70]
2Fences[190, 153, 153]
3Other[72, 0, 90]
4Pedestrians[220, 20, 60]
5Poles[153, 153, 153]
6RoadLines[157, 234, 50]
7Roads[128, 64, 128]
8Sidewalks[244, 35, 232]
9Vegetation[107, 142, 35]
10Vehicles[0, 0, 255]
11Walls[102, 102, 156]
12TrafficSigns[220, 220, 0]

Given that the values are small (0 through 12), the 3-channel label images appear black at first glance. But if you plot up just the red channel (label_image[:,:,0]) you’ll see the labels, like this:

Camera Image3-Channel Label ImageLabel Image Red Channel
alt textalt textalt text

The task is to write an algorithm to take an image like the one on the left and generate a labeled image like the one on the right. Except you will be generating a binary labeled image for vehicles and a binary labeled image for the drivable surface of the road. You can ignore other things like trees, pedestrians, etc.

Your solution will be run against a hidden test dataset. This test set consists of images that are different from the training dataset, but taken under the same environmental conditions. Your algorithm will be evaluated on both speed and accuracy.

Weighted $F_\beta$Score

In some cases, you might be more concerned about false positives, like for example, when identifying where the drivable road surface is, you don’t want to accidentally label the sidewalk as drivable. In that case the precision of your measurement is more important than recall.

On the other hand, you might be more concerned with false negatives, for example, when identifying vehicles you want to make sure you know where the whole vehicle is, and overestimating is not necessarily a bad thing. In that case, recall is more important than precision.

In most cases, however, you would like to strike some balance between precision and recall and you can do so by introducing a factor β into your F score like this:

By setting β<1, you weight precision more heavily than recall. And setting β>1, you weight recall more heavily than precision.

For this challenge you’ll be scored with β=2 for vehicles and β=0.5 for road surface.

Your final F score will be the average of your score for road surface and your $\rm{F}_{2}$ score for vehicles.



Incorporating Frames Per Second (FPS)

The speed at which your algorithm is also important and will be factored into your final score. You’ll receive a penalty for running at less than 10 FPS:

Here’s how your final score on the leaderboard will be calculated, incorporating both your F score and FPS:




CARLA Simulator:

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications :

Keras implementation of Deeplab v3+ with pretrained weights:


Leave a Comment