Print this Page

Inria Associate Team MOTIF

Inria Associate Team MOTIF — Unsupervised Multimedia Motif Discovery

Inria Principal investigator: Guillaume Gravier, LINKMEDIA, Inria Rennes & Irisa

Brazilian Principal investigator: Silvio Jamil F. Guimarães, PUC Minas, BH

Other participants:

  • Simon Malinowski, LINKMEDIA, Inria Rennes & Irisa
  • Laurent Amsaleg, LINKMEDIA, Inria Rennes & Irisa
  • Zenilton Kleber G. do Patrocínio Jr., PUC Minas
  • Arnaldo de Albuquerque Araújo, UFMG
  • William Robson Schwartz, UFMG

Contractual contributors:

  • Ph. D. students: Henrique Bastista da Silva (UFMG & PUC), Ricardo Carlini Sperandio (Inria Rennes & Irisa)
  • M. Sc. interns: Renata Viana (PUC), Leo S. de Oliveira (PUC), Corentin Hardy (Inria Rennes), Cassio E. dos Santos Jr. (UFMG & Inria Rennes)

Scientific program

This project aims at studying various approaches to unsupervised motif discovery in multimedia sequences, in particular videos and audio recordings. In this context, we will develop work along two main research directions. On the one hand, we will explore indexing-based approaches for motif discovery. In particular, we will focus on symbolic approaches inspired from work on bioinformatics, investigating symbolic representations of multimedia and adaptation of existing symbolic motif discovery algorithms. On the other hand, we will further develop cross modal clustering approaches to repeated sequence discovery in video data, building upon previous work. We will investigate new cross clustering approaches incorporating constraints on clusters and propose new selection criteria. MOTIF will develop fundamental technology at the frontier of multimedia content analysis, multimedia indexing and bioinformatics technology, with practical applications in media collection structuring.

We expect MOTIF to contribute fundamental results in

  • Symbolic multimedia data representations: How to derive a meaningful symbolic representation of multimedia data, accurate yet compact, preserving semantic interpretation? How representations impact motif discovery and which motifs can be found with which representations?
  • Efficient and scalable algorithms for multimedia motif discovery: How can we adapt existing algorithms from the field of bioinformatics to symbolic multimedia data? What temporal distortion model for such data and how does this model impact the algorithmic, the efficiency and the scalability?
  • Cross modal cluster analysis: How can relations across modalities in video be exploited to discover motifs? How can clustering algorithms be modified to deal with multimodal representations and correlations?

Scientific progress

Application of bioinformatics approaches to motif discovery in videos. Motif discovery algorithms are used in bioinformatics to find relevant patterns in genetic sequences. We started by investigating SNAP to video search, adapting the representation of videos to the original algorithm [1]. We also explored a different family of algorithms, namely MEME, for audio motif discovery.

Application of multivariate pattern mining to speech motif discovery. The data mining community has been developed various algorithms for the detection of frequent patterns in time series, for monovariate time series (e.g., PrefixSpan) as well as for multivariate ones (e.g., CMPMiner). In collaboration with the DREAM project-team of IRISA/Inria Rennes, we investigated the use of existing pattern mining algorithms to extract audio patterns from speech data [5].

Large-scale PLS-based face recognition. Face recognition has been largely studied in past years. However, related work mostly focuses on increasing accuracy and/or speed to test a single pair probe-subject. During the stay of Cassio E. dos Santos Jr. (M. Sc. intern at UFMG) at IRISA, we proposed a novel method inspired by the success of locality sensing hashing (LSH) applied to large general purpose datasets and by the robustness provided by partial least squares (PLS) analysis when applied to large sets of feature vectors for face recognition [3].

Algorithms for person discovery in broadcast TV. Building on results in face recognition, we developed algorithms to name persons speaking on TV in an unsupervised way by analyzing faces, voices, speech transcripts and text overlays. This work was carried out in the framework of the international benchmarking initiative MediaEval, where several teams worldwide competed on the task of multimodal person discovery on a large TV archive [4].


[1] Leonardo De Oliveira, Zenilton Kleber Do Patrocínio Jr., Silvio Jamil Guimarães, Guillaume Gravier. Searching for near-duplicate video sequences from a scalable sequence aligner. IEEE International Symposium on Multimedia, 2013

[2] Kleber Souza, Arnaldo de A. Araújo, Philippe Gosselin, Zenilton Patrocínio, Silvio Guimarães. Streaming Hierarchical Video Segmentation Based on an Observation Scale. IEEE Intl. Conf. On Image Processing, 2014

[3] Cassio E. dos Santos Jr, Ewa Kijak, Guillaume Gravier, William Robson Schwartz. Learning to hash faces using large feature vectors. International Workshop on Content-based Multimedia Indexing, 2015.

[4] Cassio E. dos Santos Jr, Ewa Kijak, Guillaume Gravier, William Robson Schwartz. SSIG and IRISA at Multimodal Person Discovery. Working Notes Proceedings of the MediaEval Workshop, 2015. hal-01196171

[5] Corentin Hardy, Laurent Amsaleg, Guillaume Gravier, Simon Malinowski, René Quiniou. Sequential pattern mining on multimedia data. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Database, Workshop on Advanced Analytics and Learning on Temporal Data, 2015. hal-01186444

Permanent link to this article: