Convolutional Neural Networks have shown great performance in image recognition tasks and therefore widely adopted for activity recognition in videos. In contrast to image deep models, a variety of deep architectures are used in video activity recognition and none of them is the best fit and the dominant approach. Capturing the local spatio-temporal motion information along with the long term dependencies adds more difficulties to the problem. In this work we are trying to propose enhancements to the most recent architectures used in video activity recognition.
People involved: Mojtaba Bahrami, Hamid R. Rabiee