Activity recognition results on UCF Sports and Holywood2

Table above shows the results, obtained on UCF Sports dataset (http://crcv.ucf.edu/data/UCF_Sports_Action.php). We report recognition rate with respect to the number...


Computational efficiency and parallel implementation

The developed algorithms are computationally effective and the compositional processing pipeline is well-suited for implementation on massively parallel architectures. Many...


Motion hierarchy structure

Our model is comprised of three processing stages, as shown in the Figure. The task of the lowest stage (layers...


L1: motion features

Layer L1 provides an input to the compositional hierarchy. Motion, obtained in L0 is encoded using a small dictionary.


Problem identification

Action and activity recognition and categorisation under real­world conditions are of crucial importance for awareness of one’s environment and for interaction with one’s surroundings. Perception of motion plays a central role in biological visual systems. Sophisticated mechanisms for observing, extracting, and utilizing motion exist even in primitive animals [Ullman1981]. For humans, successful motion processing is a prerequisite for accomplishing many everyday tasks [Orban2008]. Given the crucial importance of motion in biological systems, there has been a huge interest in motion related research in computer vision and artificial intelligence communities, as they strive to bring their algorithms and applications closer to the real world, and into everyday use [Aggarwal2011].

Current state­of­the­art computer vision methods work well for problems within limited domains and for specific task, and activity recognition and categorisation is no exception [Niebles2008]. However, when such methods are applied in more general settings, they become brittle, much less efficient, or even computationally intractable. In a nutshell, classic approaches are not general, and they do not scale well. Consequently, new paradigms, which would alleviate those problems, are constantly sought.

Scientific advances in the recent years, especially in the field of neuroscience [Orban2008], have provided us with inspiration and insights that have given rise to novel approaches in computer vision [Pinto2009]. Those methods do not aim to exactly duplicate the functionality of human brain, however, their goal is to improve the performance of computer vision methods using selected design principles, that take inspiration in human and primate visual perceptual systems. In some areas, those efforts have already begun to bear fruit, i.e., significantly improving the performance of computer vision methods, especially when applied to complex, real­life problems.

One of most important design principles that appears to have a potential for robust and scalable solutions is a concept of hierarchical compositionality [Bienenstock1994]. There are many reasons in favor of using hierarchical approaches in computer vision. Relation to human perception is only one of them, and in terms of computer vision, it is not the most important one. The most important reason is, that hierarchical compositionality approach allows much more efficient use of available resources than it is possible with other state­of­ the­art approaches [Fidler2010a]. These properties have been demonstrated primarily for modeling visual shape categories. What remains as an open research issue is whether a similar approach can be applied for processing motion. Showing that motion information can be systematically learned in a number of hierarchical stages (from local to global, from specific to more abstract / invariant) and then inferred in an efficient process would shed new insights into building robust computer vision systems.

The success of hierarchical compositionality models can be explained as follows. In any computer vision method there is an inherent problem of knowledge representation. This problem is especially acute with complex problems, where the knowledge is correspondingly complex. Flat representation of knowledge retains its complexity. For example, in object recognition and object categorization tasks, we deal with many objects that may have shared properties, but are otherwise distinct. If such a task is applied to a large problems of a general nature, all variations of objects have to be stored in its knowledge base. For real world problems, the dimensionality of such task may be prohibitive. On the other hand, with hierarchical compositionality model, the knowledge is stored throughout the visual hierarchy. Since the knowledge is spread throughout the hierarchy, the shared properties between different observations may be encoded only once. Such knowledge representation is significantly more flexible, it scales better, and it generalizes well. The idea that knowledge is spread throughout the visual hierarchy is also consistent with current understanding of human visual processing [Hawkins2004].

The inclusion of hierarchical compositionality models into computer vision algorithms has been gradual, and the analysis of motion is no exception. Currently, state­of­the­art approaches for motion analysis use hierarchy only in parts of visual processing pipeline, complementing it with well known and tested algorithms for low­level image processing and classification [Pinto2009]. When used this way, only a few levels in hierarchy are needed. Even using this approach, significant improvements in performance of motion analysis methods, especially in activity recognition, have been reported.

It is our aim to take research a step further, and employ hierarchical compositionality across the whole motion processing pipeline. Contrary to the state­of­the­art approaches, we plan to base our algorithm on extremely simple motion detection units, and employ learning even at the lowest stages of hierarchy. We plan then to introduce additional hierarchy levels, which would offload significant amount of complexity from the end­stage classifier to the hierarchical structure itself. Our aim is to develop a model, which will be general in nature, which means, it will be useful for different motion related tasks. Finally, our aim is to combine such model with already developed hierarchical compositional model for shape representation, resulting in a combined model, which would outperform either of those two separate models.

