| UNDERSTANDING GROUP ACTIVITIES FROM MULTIPLE FIRST-PERSON PERSPECTIVES
Prof. Yoichi Sato
University of Tokyo
In this talk, I will introduce our recent attempts on understanding group activities from multiple egocentric videos as a part of our project called Collective Visual Sensing for Analyzing Human Attention and Behavior. Unlike conventional videos, egocentric videos pose major challenges for various computer vision tasks due to severe motion blur caused by ego-motion and significantly varying viewpoints. To overcome such difficulties, we have been exploiting a novel approach of jointly analyzing multiple egocentric videos for solving different vision tasks. In particular, I will describe three methods jointly using multiple egocentric videos for identifying people, recognizing subtle actions and reactions during interaction, and discovering important visual motifs such as landmarks shared by multiple people.
| ACTIVITY AND PHOTOGRAPHER RECOGNITION FROM VISUAL MOTION
Prof. Shmuel Peleg
The Hebrew University of Jerusalem
Given a raw egocentric video, we would like to learn the identity and the activity of the photographer. Traditional methods developed for third-person videos are not applicable, as the photographer is not visible. Some information can be found indirectly from the objects seen in the video, as the location can indicate something about the photographer. However, while we do not see the photographer, the motion of the photographer's body is the major contributor to the motion we can measure from the video. We show that sparse optical flow measured in an egocentric video can be used to find the identity and the activity of the photographer. Both tasks can be performed using CNNs. In both tasks we obtained an initial accuracy of about 90%. The sparse optical flow can also be used for fast forward of egocentric videos, which are typically long, boring, and unstable. We present EgoSampling, an adaptive frame sampling that gives more stable, fast forwarded, hyperlapse videos. We further turn the camera shake from a drawback into a feature, enabling the increase of the field-of-view. This is obtained when each output frame is mosaiced from several input frames. Work done with Y. Poleg, Y. Hoshen, C. Arora, T. Halperin, and A. Ephrat