Figures and Tables from this paper
- figure 1
- table 1
- figure 2
- table 2
- figure 3
- figure 4
- figure 5
Topics
UCF101 (opens in a new tab)Playing Musical Instrument (opens in a new tab)HMDB51 (opens in a new tab)UCF50 (opens in a new tab)Action Classes (opens in a new tab)Action Recognition Datasets (opens in a new tab)Action Recognition (opens in a new tab)Action Recognition Method (opens in a new tab)Unconstrained Videos (opens in a new tab)Camera Motion (opens in a new tab)
5,399 Citations
- Carlos Ismael OrozcoM. BuemiJ. J. Berlles
- 2019
Computer Science
A CNN–LSTM architecture where a pre-trained VGG16 convolutional neuronal networks extracts the features of the input video and a LSTM classifies the video in a particular class.
- 7
- Carlos Ismael OrozcoM. BuemiJ. J. Berlles
- 2021
Computer Science
LatinX in AI at International Conference on…
This work proposes an attention mechanism adapted to a CNN–LSTM base architecture that can be used for action recognition in videos and evaluates the performance of the system using accuracy as the evaluation metric.
- 1
- Highly Influenced
- PDF
- João CarreiraAndrew Zisserman
- 2017
Computer Science
2017 IEEE Conference on Computer Vision and…
I3D models considerably improve upon the state-of-the-art in action classification, reaching 80.2% on HMDB-51 and 97.9% on UCF-101 after pre-training on Kinetics, and a new Two-Stream Inflated 3D Conv net that is based on 2D ConvNet inflation is introduced.
- 7,006 [PDF]
- Jiaming ZhouJunwei LiangKun-Yu LinJinrui YangWei-Shi Zheng
- 2024
Computer Science
ArXiv
A novel Cross-modality and Cross-action Modeling (CoCo) framework for ZSAR that significantly outperforms the state-of-the-art on three popular ZSAR benchmarks (i.e., Kinetics-ZSAR, UCF101 and HMDB51) under two different learning protocols in ZSAR.
- 5
- Highly Influenced[PDF]
- Carlos Ismael OrozcoEduardo XamenaM. BuemiJ. J. Berlles
- 2020
Computer Science
Ciencia y Tecnología
A CNN–LSTM architecture is implemented that first, a pre-trained VGG16 convolutional neural network extracts the features of the input video, then an LSTM classifies the video in a particular class.
- 5
- PDF
- W. KayJoão Carreira Andrew Zisserman
- 2017
Computer Science
ArXiv
The dataset is described, the statistics are described, how it was collected, and some baseline performance figures for neural network architectures trained and tested for human action classification on this dataset are given.
- 3,298 [PDF]
- Rohit GirdharJoão CarreiraCarl DoerschAndrew Zisserman
- 2019
Computer Science
2019 IEEE/CVF Conference on Computer Vision and…
The Action Transformer model for recognizing and localizing human actions in video clips is introduced and it is shown that by using high-resolution, person-specific, class-agnostic queries, the model spontaneously learns to track individual people and to pick up on semantic context from the actions of others.
- 647 [PDF]
- Shan SunFeng WangQi LiangLiang He
- 2017
Computer Science
ICMR
TaiChi consists of unconstrained user-uploaded web videos containing camera motion and partial occlusions which pose new challenges to fine-grained action recognition compared to the existing datasets.
- 10
- Highly Influenced
- K. MatsuiToru TamakiGwladys AuffretB. RaytchevK. Kaneda
- 2017
Computer Science
ArXiv
Experimental results on the UCF50, UCF101, and HMDB51 action datasets demonstrate that TS is comparable to state-of-the-arts, and outperforms many other methods; for HMDB the accuracy of 85.4%, compared to the best accuracy obtained by a deep method.
- Bassel S. ChawkyA. S. ElonsA. AliHowida A. Shedeed
- 2018
Computer Science
Different action recognition datasets are explored to highlight their ability to evaluate different models, and a usage is proposed for each dataset based on the content and format of data it includes, the number of classes and challenges it covers.
- 4
...
...
13 References
- Hilde KuehneHueihan JhuangEstíbaliz GarroteT. PoggioThomas Serre
- 2011
Computer Science
2011 International Conference on Computer Vision
This paper uses the largest action video database to-date with 51 action categories, which in total contain around 7,000 manually annotated clips extracted from a variety of sources ranging from digitized movies to YouTube, to evaluate the performance of two representative computer vision systems for action recognition and explore the robustness of these methods under various conditions.
- 3,481
- PDF
- Jingen LiuJiebo LuoM. Shah
- 2009
Computer Science
2009 IEEE Conference on Computer Vision and…
This paper presents a systematic framework for recognizing realistic actions from videos “in the wild”, and uses motion statistics to acquire stable motion features and clean static features, and PageRank is used to mine the most informative static features.
- 1,034
- PDF
- Christian SchüldtI. LaptevB. Caputo
- 2004
Computer Science
Proceedings of the 17th International Conference…
This paper construct video representations in terms of local space-time features and integrate such representations with SVM classification schemes for recognition and presents the presented results of action recognition.
- 3,989
- PDF
- Marcin MarszalekI. LaptevC. Schmid
- 2009
Computer Science
2009 IEEE Conference on Computer Vision and…
This paper automatically discover relevant scene classes and their correlation with human actions, and shows how to learn selected scene classes from video without manual supervision and develops a joint framework for action and scene recognition and demonstrates improved recognition of both in natural video.
- 1,352
- PDF
- Daniel Weinland
- 2007
Computer Science
A new framework is proposed where actions are model actions using three dimensional occupancy grids, built from multiple viewpoints, in an exemplar-based HMM, where a 3D reconstruction is not required during the recognition phase, instead learned 3D exemplars are used to produce 2D image information that is compared to the observations.
- 5
- PDF
- Daniel WeinlandEdmond BoyerRémi Ronfard
- 2007
Computer Science
2007 IEEE 11th International Conference on…
A new framework is proposed where actions are model actions using three dimensional occupancy grids, built from multiple viewpoints, in an exemplar-based HMM, where a 3D reconstruction is not required during the recognition phase, instead learned 3D exemplars are used to produce 2D image information that is compared to the observations.
- 516
- PDF
- M. BlankLena GorelickEli ShechtmanM. IraniR. Basri
- 2005
Computer Science
Tenth IEEE International Conference on Computer…
The method is fast, does not require video alignment and is applicable in many scenarios where the background is known, and the robustness of the method is demonstrated to partial occlusions, non-rigid deformations, significant changes in scale and viewpoint, high irregularities in the performance of an action and low quality video.
- 2,316
- PDF
- Juan Carlos NieblesChih-Wei ChenLi Fei-Fei
- 2010
Computer Science
ECCV
A framework for modeling motion by exploiting the temporal structure of the human activities, which represents activities as temporal compositions of motion segments, and shows that the algorithm performs better than other state of the art methods.
- 795
- PDF
- Mikel D. RodriguezJ. AhmedM. Shah
- 2008
Computer Science
2008 IEEE Conference on Computer Vision and…
This paper generalizes the traditional MACH filter to video (3D spatiotemporal volume), and vector valued data, and analyzes the response of the filter in the frequency domain to avoid the high computational cost commonly incurred in template-based approaches.
- 1,321
- PDF
- D. DamenDavid C. Hogg
- 2008
Computer Science
ECCV
A new method for detecting objects such as bags carried by pedestrians depicted in short video sequences by comparing the temporal templates against view-specific exemplars generated offline for unencumbered pedestrians, which yields a segmentation of carried objects using the MAP solution.
- 230
- PDF
...
...
Related Papers
Showing 1 through 3 of 0 Related Papers