[UAI] [news] The First Continual Semi-Supervised Learning Challenge @ IJCAI 2021

Fabio Cuzzolin Thu, 06 May 2021 11:06:05 -0700

*The First Continual Semi-Supervised Learning Challenge*

*Call for participation*


The Challenge is organised as part of the upcoming IJCAI 2021 *First
International Workshop on Continual Semi-Supervised Learning*

https://sites.google.com/view/sscl-workshop-ijcai-2021/

*Aim of the Workshop*

Whereas continual learning has recently attracted much attention in the
machine learning community, the focus has been mainly on preventing the
model updated in the light of new data from ‘catastrophically forgetting’
its initial knowledge and abilities. This, however, is in stark contrast
with common real-world situations in which an initial model is trained
using limited data, only to be later deployed without any additional
supervision. In these scenarios the goal is for the model to be
incrementally updated using the new (unlabelled) data, in order to adapt to
a target domain continually shifting over time.

The aim of this workshop is to formalise this new *continual* *semi-supervised
learning* paradigm, and to introduce it to the machine learning community
in order to mobilise effort in this direction. We present the first two
benchmark datasets for this problem, derived from significant computer
vision scenarios, and propose the first *Continual Semi-Supervised Learning
Challenges* to the research community.

*Problem Statement*

In semi-supervised continual learning, an initial training batch of data
points annotated with ground truth (class labels for classification
problems, or vectors of target values for regression ones) is available and
can be used to train an initial model. Then, however, the model is
incrementally updated by exploiting the information provided by a time
series of *unlabelled* data points, each of which is generated by a data
generating process (modelled, as typically assumed, by a probability
distribution) which may vary with time, without any artificial subdivision
into ‘tasks’.

*Challenges*

We propose both a *continual activity recognition *(CAR) challenge and
a *continual
crowd counting* (CCC) challenge.

https://sites.google.com/view/sscl-workshop-ijcai-2021/challenges

In the former, the aim is to devise a learning mechanism for updating a
baseline action recognition method (working at frame level) based on a data
stream of video frames, of which only the initial fraction is labelled (a
classification problem).

In the latter, the learning mechanism is applied to a baseline crowd
counting method, also working on a frame-by-frame basis, and exploits a
data stream of video frames of which only an initial fraction come with
ground truth attached in the form of a density map (a regression problem).

*Benchmark Datasets*

As a benchmark for the continual activity recognition challenge we have
created a *Continual Activity Recognition (CAR) dataset*, derived from a
fraction of the MEVA (Multiview Extended Video with Activities) activity
detection dataset (https://mevadata.org/). We selected a suitable set of 8
activity classes from the original list of 37, and annotated each frame in
15 video sequences, each composed by 3 clips originally from MEVA, with a
single class label.

Our CAR benchmark is thus composed of 15 sequences, broken down into three
groups:

· Five 15-minute-long sequences formed by three original videos which are
contiguous.

· Five 15-minute-long sequences formed by three videos separated by a short
time interval (5-20 minutes).

· Five 15-minute-long sequences formed by three original videos separated
by a long interval of time (hours or even days).

Each of these three evaluation settings is designed to simulate a different
mix of continuous and discrete dynamics of the domain distribution.

The raw video sequences are directly accessible from the Challenge website.

Our CCC benchmark is composed of 3 sequences, taken from existing crowd
counting datasets:

· A single 2,000 frame sequence from the Mall dataset.

· A single 2,000-frame sequence from the UCSD dataset.

· A 750-frame sequence from the Fudan-ShanghaiTech (FDST) dataset, composed
of 5 clips portraying the same scene each 150 frames long.

*Ground truth*

The ground truth for the CAR challenges (in the form of one activity label
per frame) was created by us, after selecting a subset of 8 activity
classes and revising the original annotation for the 45 video clips we
selected for inclusion.

The ground truth for the CCC challenges (in the form of a density map for
each frame) was generated by us for all three datasets following the
annotation protocol described in

https://github.com/svishwa/crowdcount-mcnn

The ground truth for both challenges will be released on the Challenge web
site according to the following schedule:

·        Training and validation fold release: May 5 2021

·        Test fold release: June 30 2021

·        Submission of results: July 15 2021

·        Announcement of results: July 31 2021

·        Challenge event @ workshop: August 21-23 2021

*Tasks*

For each challenge we propose two separate tasks, *incremental* and
*absolute*.

CAR-A: The goal is to achieve the *best average performance* across
all the *unlabelled
test portion* of the 15 sequences in the CAR dataset. The choice of the
baseline action recognition model is left to the participants.

CAR-I: The goal here is to achieve the *best performance improvement* over
the baseline (supervised) model, measured on the unlabelled test data
stream, on average over the 15 sequences. The baseline recognition model is
set by us (see Baselines below).

CCC-A: Seeks the best average performance over the unlabelled test portion
of the 3 sequences of the CCC dataset.  The choice of the baseline crowd
counting model is left to the participants.

CCC-I: Seeks the best *performance improvement* over the baseline, measured
on the unlabelled test portion of the data stream, on average over the 3
sequences of CCC. The baseline crowd counting model is set by us (see
Baselines below).

*Protocol for Incremental Training and Testing*

Following from the problem definition, once a model is fine-tuned on the
supervised portion of a data stream it is subsequently both incrementally
updated using the unlabelled portion of the same data stream *and* tested
there, using the available ground truth (encapsulated in an evaluation
script).

Importantly, *incremental training and testing must happen independently
for each sequence*, as we intent to simulate real-world scenarios in which
a smart device with continual learning capability can only learn from its
own data stream after deployment.

The two challenges differ in the sense that, whereas in CAR the baseline
activity recognition model is initially fine-tuned using the supervised
folds of *all the 15 available sequences jointly* (since each sequence only
portrays a subset of the 8 activity classes), in CCC the supervised
fine-tuning happens *sequence by sequence* (because of the disparate nature
of the videos captured in different settings).

*Split into Training, Validation and Test*

The data for the challenges are released in two Stages:

1.       We first release the supervised portion of each data stream,
together with a portion of the unlabelled data stream to use for the
validation of the semi-supervised continual learning approach proposed by
the participants.

2.       The remaining portion of the unlabelled data stream for each
sequence in the dataset is released at a later stage to be used for the
testing of the proposed approach.

Consequently, each data stream (sequence) in our benchmarks is divided into
a supervised fold (S), a validation fold (V) and a test fold (T).

For the CAR challenge, the supervised fold for each sequence coincides with
the first 5-minute video, the validation fold with the second 5-minute
video, and the test fold with the third 5-minute video.

For the CCC challenge we distinguish two cases. For the 2,000-frame
sequences from either the UCSD or the Mall dataset, S is formed by the
first 400 images, V by the following 800 images, and T by the remaining 800
images. For the 750-frame sequence from the FDST dataset, S is the set of
the first 150 images, V the set of the following 300 images, and T the set
of the remaining 300 images.

*Evaluation*

Participants will be able to evaluate the performance of their method(s) on
both the incremental and the absolute versions of the challenges on *eval.ai
<http://eval.ai>*.

In Stage 1 participants will, for each task (CAR-A, CAR-I, CCC-A, CCC-I),
submit their predictions as generated on the validation folds and get the
evaluation metric in return, in order to get a feel of how well their
method(s) work. In Stage 2 they will submit the predictions generated on
the test folds which will be used for the final ranking.

A separate ranking will be produced for each of the tasks.

For each of the challenge stages and each task the maximum number of
submissions through the EvalAI platform is capped at 50, with an additional
constraint of 5 submissions per day.

Detailed instructions about how to download the data and submit your
predictions for evaluation at both validation and test time, for all four
tasks, are provided here:

https://sites.google.com/view/sscl-workshop-ijcai-2021/challenges

*Baselines*

Baselines are provided regarding the initial action recognition model to be
used in CAR-A, the base crowd counter to be used in CCC-A, and the
semi-supervised incremental learning process itself.

*Baseline activity recognition model*

As our activity recognition baseline model we adopted the recent
EfficientNet[1] <#_ftn1> network (model EfficientNet-B5), pre-trained on
the large-scale ImageNet dataset. Detailed information about its
implementation, along with pre-trained models, can be found on Github

https://github.com/lukemelas/EfficientNet-PyTorch

and is easily downloadable using the Python command “pip” (pip install
efficientnet-pytorch).

Note that the performance of the baseline activity model is rather poor on
our challenge videos, as relevant activities occupy only a small fraction
of the duration of the videos. This leaves much room for improvement while
properly representing the level of challenge real-world data poses.

*Baseline crowd counter*

As baseline crowd counting model we selected the Multi-Column Convolutional
Neural Network (MCNN)[2] <#_ftn2>. Its implementation, along with
pre-trained models, can also be found on Github

https://github.com/svishwa/crowdcount-mcnn

This network is implemented using PyTorch. Pre-trained models are available
for both the ShanghaiTech A and the ShanghaiTech B datasets. For this
Challenge we chose to adopt the ShanghaiTechB pre-trained model.

*Baseline incremental learning approach*

Finally, our baseline for incremental learning from unlabelled data stream
is based on a *vanilla (batch) self-training* approach.

For each sequence, the unlabelled data stream (without distinction between
validation and test folds) is partitioned into a number of sub-folds. Each
sub-fold spans 1 minute in the CAR challenges, so that each unlabelled
sequence is split into 10 sub-folds. Sub-folds span 100 frames in the CCC
challenges, so that the UCSD and MALL (unlabelled) data streams are
decomposed into 16 sub-folds whereas the FDST sequence only contains 6
sub-folds.

Starting with the model initially fine-tuned on the supervised portion of
the data stream, self-training is iteratively applied in a batch fashion to
each sub-fold. The predictions generated by the model obtained after
convergence upon a sub-fold are the baseline predictions for the current
sub-fold. The output of each self-training session is used as start model
for the following session.

*Reproducibility*

Participants will be clearly told that we reserve the right to reproduce
their results and check their validity. We will certainly reproduce the
results of the challenge winners.

------------------------------

[1] <#_ftnref1> Tan, M., & Le, Q. (2019, May). EfficientNet: Rethinking
model scaling for convolutional neural networks. In International
Conference on Machine Learning (pp. 6105-6114). PMLR.

[2] <#_ftnref2> Zhang, Yingying, et al. "Single-image crowd counting via
multi-column convolutional neural network." Proceedings of the IEEE
conference on computer vision and pattern recognition. 2016.

_______________________________________________
uai mailing list
uai@engr.orst.edu
https://it.engineering.oregonstate.edu/mailman/listinfo/uai

[UAI] [news] The First Continual Semi-Supervised Learning Challenge @ IJCAI 2021

Reply via email to