Developing an accurate computational model of human visual attention has been a long-standing challenge. Such a model may allow any system to select only relevant information from a complex and cluttered visual input in numerous artificial vision applications, such as robotics, surveillance, driving assistance, multimedia recognition and retrieval.
The first biologically plausible model for explaining the human visual attention system was proposed by Koch and Ullman (1985), and later implemented by Itti et al (1998). This model analyzes still images to produce primary visual features, such as intensity, color and orientation, which are combined to form a saliency map that represents the relevance of visual attention. Although several attempts have been made to improve the Koch-Ullman model, they all suffer from a crucial problem in that the saliency responses are assumed to be deterministic.
Based on the above considaration, we propose a new stochastic model of visual attention. The proposed model is composed of a dynamic Bayesian network with four layers that combines several fundamental statistical models. The proposed model enable us to
- automatically estimate eye focusing positions an their densities only from video frames,
- automatically derive model parameters of the network with the EM algorithm when eye tracking sequences of human subjects are available,
- execute the estimation procedure in near real-time (70 msec/frame @ 640x480 pixels) with stream processing through such as GPUs, by introducing MCMC-based particle filter.
Experimental results have demonstrated that our model performs significantly better in predicting human visual attention compared to the previous deterministic model.
Demo movie
(Left) Input video, (Right) Eye focusing density map (White regions (left), black regions (right)) Eye focusing density: more white, more probable
Selected publications
- Derek Pang, Akisato Kimura, Tatsuto Takeuchi, Junji Yamato and Kunio Kashino "A stochastic model of selective visual attention with a dynamic Bayesian network," Proc. International Conference on Multimedia and Expo (ICME2008), pp.1073--1076, Hannover, Germany, June 2008.
- Akisato Kimura, Derek Pang, Tatsuto Takeuchi, Junji Yamato and Kunio Kashino "Dynamic Markov random fields for stochastic modeling of visual attention," Proc. International Conference on Pattern Recognition (ICPR2008), Mo.BT8.35, Tampa, Florida, USA, December 2008.
- Kouji Miyazato, Akisato Kimura, Shigeru Takagi and Junji Yamato "Real-time estimation of human visual attention with MCMC-based particle filter," Proc. International Conference on Multimedia and Expo (ICME2009), New York, New York, USA, June-July 2009.