Login Form


Activity recognition results on UCF Sports and Holywood2

Table above shows the results, obtained on UCF Sports dataset (http://crcv.ucf.edu/data/UCF_Sports_Action.php). We report recognition rate with respect to the number...


Computational efficiency and parallel implementation

The developed algorithms are computationally effective and the compositional processing pipeline is well-suited for implementation on massively parallel architectures. Many...


Motion hierarchy structure

Our model is comprised of three processing stages, as shown in the Figure. The task of the lowest stage (layers...


Server crash

After experiencing a total server failure, we are back online. We apologize for the inconvenience - we are still in...


L1: motion features

Layer L1 provides an input to the compositional hierarchy. Motion, obtained in L0 is encoded using a small dictionary.


L2-Lx: Compositional hierarchy

Layers L2 and above represent a compositional hierarchy, where, layer-by-layer, compositions of parts from the lower layers are formed.

The main advantage of the compositional hierarchy is the ability to train one layer at a time. Training of the layer consists of the following steps:

  • Lower layer results are obtained. At L2, output of L1 is used directly, above L2, inference on lower layer provides the input data.
  • Spatio-temporal co-occurences of the lower-layer parts are observed and statistics is built. Even at this stage, many possible co-occurences are never observed, and many are observed only rarely.
  • Optimization step is run, trying to minimize the criterion function consisting of the two main components: the dictionary size (which is minimized) and the L1 symbol coverage (which is maximized). Essentially, we strive to describe as much of input data as possible with the smallest number of symbols possible.
  • The final result of the training process is the sparse matrix of indices (translation table), which provides translation of the pair of lower-layer parts into the index of the current-layer composition.

Inference using the unknown video is performed simply by obtaining L1 output, and indexing it upwards through a series of translation tables, until the top of the hierarchy is reached. The images below show the results of the training on a collage of "Robowood" clips. First, the translation table before the optimization step (note the large number of symbols with low co-occurence counts - white squares with blue color inside). Click to enlarge the image.

L2train-RobowoodTX-before-opt-smallNext, the translation table after the optimization step (note that the size of dictionary has decreased significantly). Click to enlarge the image.


This website uses cookies to manage authentication, navigation, and other functions. By using our website, you agree that we can place these types of cookies on your device.

View e-Privacy Directive Documents

You have declined cookies. This decision can be reversed.

You have allowed cookies to be placed on your computer. This decision can be reversed.