Overview

The existing research in action recognition is mostly focused on high-quality videos where the action is distinctly visible. Therefore, the available action recognition models are not designed for low-resolution videos and their performance is still far from satisfactory when the action is not distinctly visible. In real-world surveillance environments, the actions in videos are captured at a wide range of resolutions. Most activities occur at a distance with a small resolution and recognizing such activities is a challenging problem.

The ActivityNet challenge has seen a wide range of tasks relevant to action recognition, ranging from temporal activity recognition to spatio-temporal action detection. However, in all the tasks we have seen so far, the focus has never been on low-resolution activities. In all the used datasets the videos are of high-resolution and the occurring activities cover most of the frame area.

In this challenge, the focus is on recognizing and detecting tiny actions in videos. The existing approaches addressing this issue perform their experiments on artificially created datasets where the high-resolution videos are down-scaled to a smaller resolution to create a low-resolution sample. However, re-scaling a high-resolution video to a lower- resolution does not reflect real world low-resolution video quality. Real world low-resolution videos suffer from grain, camera sensor noise, and other factors, which are not not present in the down-scaled videos. We will provide benchmark datasets for activity recognition and activity detection which contains natural low-resolution activities.

Tasks

This year we will focus on two different tasks:

Dataset details

This challenge will use two different benchmark datasets for the above tasks. For the recognition task, we will use TinyVIRAT-v2 benchmark and for activity detection we will use MAMA (Multi-Actor Muti-Action) dataset.

Recognition: We will use TinyVIRAT-v2 benchmark for low-resolution action recognition. The videos in TinyVIRAT-V2 are realistic and extracted from real-world surveillance videos. This is a multi-label dataset with multiple actions per video clip which makes it even more challenging. The dataset has around 26355 activity instances with 16950 training, 3308 validation and 6097 testing instances. The length of the activities vary from sample to sample with an average length of around 3 seconds. It contains arbitrary sized low-resolution videos which ranged from 10x10 pixels to 128x128 pixels with an average of 70x70 pixels. The videos in the proposed dataset are naturally low resolution and they reflect real-life challenges.

Detection: We provide a new benchmark dataset (MAMA dataset) for activity detection which has naturally occurring tiny actions. This dataset is a temporally trimmed version of VIRAT/MEVA datasets. All of our samples are obtained from CCTV cameras mounted at high altitudes, which results in lower resolution of the action regions. MAMA consists of 35 classes. MAMA consists of a total of 32726 videos with 25837 videos in the train split, and 6889 videos in the test split. The length of clips in the dataset ranges from 1 sec to 5 sec. The action-classes in the MAMA dataset capture both human-human “indirect” actions like talking, as well as physical contact such as “hand_interacting_with_person”. Furthermore, we also annotate the human-object interactions, such as opening a door, etc. Further details about the evaluation protocol and the annotation format can be found here .

Here are the links to donwload the dataset

Evaluation

Both the tasks will be evaluated using a public leaderboard where the participants will submit their models prediction on the test set.

Recognition: TinyVIRAT-v2 has multiple labels in each sample and the submissions will be asked to predict multiple action classes for each sample. The performers can choose a prediction threshold of their choice and will be required to submit only the occurring activities for each sample. The submissions will be evaluated using precision, recall, and F1-score. The winners will be determined based on the F1-score averaged over each class.

We will provide an evaluation server where the performers can submit their results. A text file with multiple lines will be submitted by the performers for evaluation where each line will have predictions for a test sample. The performers will be required to submit a one-hot vector indicating which activities are present for each test sample. These one-hot vectors will be used to compute class-wise precision, recall, and F1-score.

Detection: MAMA dataset will have multiple activites in each sample and the submissions should contain detections for all activity instance in each sample. The performers can choose a prediction threshold of their choice and will be required to submit only the occurring activities for each sample. The submissions will be evaluated using frame-level and video-level mean-average precision (mAP) at 0.5 threshold. The winners will be determined based on the video-level mAP averaged over all classes at 0.5 threshold.

We will provide an evaluation server where the performers can submit their results. A text file with multiple lines will be submitted by the performers for evaluation where each line will have detections for a test sample. For action detection, the predictions will include the class probablity and detected bounding boxes on each frame of a video.

Important dates

Winners

Recognition

Organizers

Praveen
Praveen Tirupattur
CRCV, University of Central Florida (UCF)
Aayush
Aaysuh J Rana
CRCV, University of Central Florida (UCF)
Akash
Akash Kumar
CRCV, University of Central Florida (UCF)
Rajat
Rajat Modi
CRCV, University of Central Florida (UCF)
Shruti Vyas
Shruti Vyas
CRCV, University of Central Florida (UCF)
Yogesh Rawat
Yogesh Rawat
CRCV, University of Central Florida (UCF)
Mubarak Shah
Mubarak Shah
CRCV, University of Central Florida (UCF)

References

Please cite the following works, if you use these datasets in your research:

Contact

Feel free to contact us at yogesh@ucf.edu if you have any questions.
Join this mailing list for updates: https://groups.google.com/g/tinyactions
Thanks for being with us!