Login Form

Editors

Activity recognition results on UCF Sports and Holywood2

Table above shows the results, obtained on UCF Sports dataset (http://crcv.ucf.edu/data/UCF_Sports_Action.php). We report recognition rate with respect to the number...


Read More...

Computational efficiency and parallel implementation

The developed algorithms are computationally effective and the compositional processing pipeline is well-suited for implementation on massively parallel architectures. Many...


Read More...

Motion hierarchy structure

Our model is comprised of three processing stages, as shown in the Figure. The task of the lowest stage (layers...


Read More...

Server crash

After experiencing a total server failure, we are back online. We apologize for the inconvenience - we are still in...


Read More...

L1: motion features

Layer L1 provides an input to the compositional hierarchy. Motion, obtained in L0 is encoded using a small dictionary.


Read More...
01234

Hierarchical compositional models

An important feature of human visual system is the concept of compositionality. Very early in human life, in infant stage, an ability to learn and model co­occurences of various visual features, emerges [Fiser2002]. Through several stages which progress in complexity a representation develops which is composed of parts, where simpler parts make up more complex ones [Bienenstock1994].

This idea has been exploited for state­of­the­art object recognition algorithms, such as [Fidler2006a, Fidler2007, Fidler2008, Fidler2009b, Fidler2010a]. It essentially models the hierarchy of the visual processing that correspond to certain features of the human visual cortex [Fidler2009a]. The hierarchy of representation is modeled through the compositional approach, where simple units of shape (line segments) are gradually combined into more complex features. This approach results in a generative model, which is an important property, as it allows the reasoning behind the decision to be explained.

The approach works as follows [Fidler2006a]: at the bottom of hierarchy, low­level shape features are extracted with the help of Gabor filters. Configurations of parts for higher two levels are obtained by top­down projection and bottom­up learning on a set of appropriate training images. There are numerous challenges when dealing with compositional models. As the number of levels is increased, optimized learning algorithms are needed to deal with complexity of training [Fidler2007]. The issue of scale in the representations has to be addressed [Fidler2009b] as well.

This approach resulted in an extremely efficient generative model which can be represented very efficiently, as all the knowledge is derived from the learned connections between various parts. Those parts are also composed from other parts, and only at the bottom of the compositional hierarchy the connection to the original image is evident through the Gabor filter responses. This results in a fast inference in such a model.

One drawback of the existing multilayered compositional frameworks such as [Fidler2009b] is that they are feed­forward networks. This means that during detection, a compositional part at a certain level is detected only if it receives a significant support from the layer directly below. Recently [Wu2010] have explored a feed­back­based solution to deal with these problems. They proposed an alpha­beta­gamma channels of information flows for feed­forwarding and feed­backing detection information across layers. They could demonstrate improvements in their three­layer network for face detection. A more principled approach to the feed­back solutions is discussed by [Lee2003], albeit only theoretically.

Another drawback of the existing multilayered compositional frameworks is that they are entirely reconstructive in nature. This means that they are trained for reconstruction and this information is used to build detectors. A direct consequence is that they perform poorly in discriminating between similar categories. This drawback is expected to be pronounced also in detecting motion categories. Related research in online learning [Kristan2010] and object recognition [Fidler2006b] has shown, however, that reconstructive approaches can be made to account for discriminative information, thus improving the robustness of classification. Recently, [Felzenszwalb2009] have applied a latent SVM for training a single­ layer compositional model and obtained state­of­the­art results for object detection.

This website uses cookies to manage authentication, navigation, and other functions. By using our website, you agree that we can place these types of cookies on your device.

View e-Privacy Directive Documents

You have declined cookies. This decision can be reversed.

You have allowed cookies to be placed on your computer. This decision can be reversed.