*The ROAD Challenge: Event Detection for Situation Awareness in Autonomous
Driving*

*Call for participation*

https://sites.google.com/view/roadchallangeiccv2021/challenge

*Aim of the Challenge*

The goal of this Challenge is to put to the forefront of the research in
autonomous driving the topic of *situation awareness*, intended as the
ability to create semantically useful representations of dynamic road
scenes, in terms of the notion of a *road event*.

*The ROAD dataset*

This concept is at the core of the new ROad event Awareness Dataset (ROAD)
for Autonomous Driving

https://github.com/gurkirt/road-dataset



ROAD is the first benchmark of its kind, a multi-label dataset designed to
allow the community to investigate the use of semantically meaningful
representations of dynamic road scenes to facilitate situation awareness
and decision making.

It contains 22 long-duration videos (ca 8 minutes each) annotated in terms
of “road events”, defined as triplets of Agent, Action and Location labels
and represented as ‘tubes’, i.e., series of frame-wise bounding box
detections.



The above GitHub repository contains all the necessary instructions to
pre-process the 22 ROAD videos, unpack them to the correct directory
structure and run the provided baseline model.



*Tasks and Challenges*

The Challenge considers three *video-level* detection Tasks:

T1. *Agent* detection, in which the output is in the form of agent tubes
collecting the bounding boxes associated with an active road agent in
consecutive frames.

T2. *Action* detection, where the output is in the form of action tubes
formed by bounding boxes around an action of interest in each video frame.

T3. *Road event* detection, where by road event we mean a triplet (Agent,
Action, Location) as explained above, once again represented as a tube of
frame-level detections.

Each Task thus consists in regressing whole series (‘tubes’) of
temporally-linked bounding boxes associated with relevant instances,
together with their class label(s).

*Baseline*

As a baseline for all three detection tasks we propose a simple yet
effective 3D feature pyramid network with focal loss, an architecture we
call 3D-RetinaNet:

http://arxiv.org/abs/2102.11585

The code is publicly available on GitHub:

https://github.com/gurkirt/3D-RetinaNet

*Timeframe*

Challenge participants have 18 videos at their disposal for training and
validation. The remaining 4 videos are to be used to test the final
performance of their model. This will apply to all three Tasks.

The timeframe for the Challenge is as follows:

·        Training and validation fold release: April 30 2021

·        Test fold release: July 20 2021

·        Submission of results: August 10 2021

·        Announcement of results: August 12 2021

·        Challenge event @ workshop: October 10-17 2021

*Evaluation*

Performance in each task is measured by video mean average precision
(video-mAP), with an Intersection over Union (IoU) detection threshold set
to 0.1, 0.2 and 0.5. The final performance of each task will be determined
by the equally-weighted average of the performances at the three
thresholds.

In the first stage of the Challenge participants will, for each task,
submit their predictions as generated on the validation fold and get the
evaluation metric in return, in order to get a feel of how well their
method(s) work. In the second stage they will submit the predictions
generated on the test fold which will be used for the final ranking.

A separate ranking will be produced for each of the Tasks.

Evaluation takes place on the EvalAI platform:

https://eval.ai/web/challenges/challenge-page/1059/overview

For each Challenge stage and each Task the maximum number of submissions is
capped at 50, with an additional constraint of 5 submissions per day.
_______________________________________________
uai mailing list
uai@engr.orst.edu
https://it.engineering.oregonstate.edu/mailman/listinfo/uai

Reply via email to