Name: François Brémond (

Institution: INRIA Sophia Antipolis


Video Understanding for Activity Monitoring

François Brémond – PULSAR – INRIA – Sophia Antipolis



Keywords:  event recognition, activity monitoring, sensor fusion, real-time and adaptable systems.


François Brémond is leading the video scene understanding group in the PULSAR team at INRIA Sophia Antipolis. He obtained his Master degree in 1992 at ENS Lyon. He designs and develops generic systems for dynamic scene interpretation. The targeted class of applications is the automatic interpretation of indoor and outdoor partially structured scenes observed with sensors and in particular with static cameras. These systems detect and track mobile objects, which can be either humans or vehicles, and recognize their behaviours. He is particularly interested in filling the gap between sensor information (pixel level) and recognized activities (semantic level). In 1997 he obtained his PhD degree at INRIA in video understanding and François Brémond pursued his research work as a post doctorate at USC on the interpretation of videos taken from UAV (Unmanned Airborne Vehicle) in DARPA project VSAM (Visual Surveillance and Activity Monitoring). He also has participated to six European projects (PASSWORD, ADVISOR, AVITRACK, SERKET, CANTATA, COFRIEND), one DARPA project, several national projects (SAMSIT, SIC, VideoID …), seven industrial research contracts (RATP, FNCA, SNCF, ALSTOM, ST-MicroElectronics,…) and several international cooperations (USA, Taiwan, UK, Belgium) in video understanding. François Brémond is author or co-author of more than 75 scientific papers published in international journals or conferences in video understanding. In 2005 he was a co-fonder of Keeneo, a company in intelligent video surveillance.

More information is available at:





Research focus


My research activities aim at designing a holistic approach for Scene Understanding which combines for example, extracted visual features, a priori knowledge and learned activity models. Here, scene understanding corresponds to the real time process of perceiving, analysing and elaborating an interpretation of a 3D dynamic scene observed through a network of sensors. This process consists mainly in matching signal information coming from sensors observing the scene with a large variety of models which humans are using to understand the scene. This scene can contain a number of physical objects of various types (e.g. people, vehicle) interacting with each others or with their environment (e.g. equipment) more or less structured. The scene can last few instants (e.g. the fall of a person) or few months (e.g. the depression of a person), can be limited to a laboratory slide observed through a microscope or go beyond the size of a city. Sensors include usually cameras (e.g. omni directional, infrared), but also may include microphones and other sensors (e.g. optical cells, contact sensors, physiological sensors, radars, smoke detectors).

I am interested in two main application domains: safety/security and healthcare monitoring.