JPI Cultural Heritage READ-IT – Reading Europe Advanced Data Investigation Tool

READ-IT is a transnational, interdisciplinary R&D project that will build a unique large-scale, user- friendly, open access, semantically-enriched investigation tool to identify and share groundbreaking evidence about 18th-21st century Cultural Heritage of reading in Europe. READ-IT will ensure the sustainable and reusable aggregation of qualitative data allowing an in-depth analysis of the Cultural Heritage of reading. State-of-the art technology in Semantic Web and information systems will provide a versatile, end-users oriented environment enabling scholars and ordinary readers to retrieve information from a vast amount of community-generated digital data leading to new understanding about the circumstances and effects of reading in Europe. The interdisciplinary collaboration between established digital humanists, human & social sciences scholars and computer researchers will investigate innovative ways of gathering new resources through crowdsourcing and web-crawling as well as linking and reusing preexisting datasets.

Dates: 2018 - 2020 
Contacts: Guillaume Gravier


Multimedia data are usually complex, possibly combining multiple channels and time information to convey a message. It is highly dimensional, multimodal, and involves variability and distortion. TRANSFORM groups teams from France, Brazil and Chile to design and study different transformations of multimedia data that facilitates its manipulation. TRANSFORM focuses on transforming multimedia data into compact representations that are suited for indexing and retrieval purposes. We will design particularly transformations adapted to 3D-shapes, temporal data and multimodal data. The designed representations will be integrated into efficient indexing schemes for retrieval purposes. Targeted applications will include 3D-shapes indexing, discovery of audiovisually coherent fragments, of recurrence in speech data and multimodal content linking and navigation.

Dates: January 2018 - December 2019 
Contacts: Simon Malinowski

Inria Project Lab iCODA – Knowledge-mediated Content and Data Analytics

One of today’s major issues in data science is the design of algorithms that allow analysts to effi- ciently infer useful information and knowledge by collaboratively inspecting heterogeneous informa- tion sources, from structured data to unstructured content. Taking data journalism as an emblematic use-case, the goal of the project is to develop the scientific and technological foundations for knowledge- mediated user-in-the-loop collaborative data analytics on heterogenous information sources, and to demonstrate the effectiveness of the approach in realistic, high-visibility use-cases. The project stands at the crossroad of multiple research fields—content analysis, data management, knowledge represen- tation, visualization—that span multiple Inria themes, and counts on a club of major press partners to define usage scenarios, provide data and demonstrate achievements.

Dates: May 2017 - Dec. 2020 
Contacts: Guillaume Gravier, Laurent Amsaleg


The IoT will contain a huge number of devices and objects that have very low or nonexistent processing and communication resources, coupled to a small number of high-power devices. The weakest devices, which are most ubiquitous, will not be able to authenticate themselves using cryptographic methods. This project addresses these issues using physical unclonable functions (PUFs). PUFs, and especially quantum
readout PUFs, are ideally suited to the IoT setting because they allow for the authentication and identification of physical objects without requiring any crypto or storage of secret information. Furthermore, we foresee that back-end systems will not be able to provide security and privacy via cryptographic primitives due to the sheer number of IoT devices. Our plan is to address these problems
using privacy-preserving database structures and algorithms with good scaling behaviour. Approximate nearest neighbour (ANN) search algorithms, which have remarkably good scaling behaviour, have
recently become highly efficient, but do not yet have the right security properties and have not yet been applied to PUF data.

Dates: October 2016 - September 2020 
Contacts: Teddy Furon

Labex CominLabs BigCLIN

Data collected or produced during clinical care process can be exploited at different levels and across different domains. Yet, a well-known challenge for secondary use of health big data is that much of detailed patient information is embedded in narrative text, mostly stored as unstructured data. The project proposes to address the essential needs when reusing unstructured clinical data at a large scale. We propose to develop new clinical records representation relying on fine-grained semantic annotation thanks to new NLP tools dedicated to French clinical narratives. To efficiently map this added semantic information to existing structured data for further analysis at big scale, the project also addresses distributed systems issues: scalability, management of uncertain data and privacy, stream processing at runtime, etc.

Dates: September 2016 - August 2019
Contacts: Vincent Claveau

CNRS – CONFAP FIGTEM – Fine-grained text-mining for clinical trials

FIGTEM aims at developing natural language processing methods, including information extraction and indexing, dedicated to the clinical trial domain. The goal is to populate a formal representation of patients (via their Electronic Patient Records) and clinical trial data in different languages. These methods will be used within a recruitment support system for the purpose of clinical and therapeutic trials. To this end, we propose to develop text representation relying on fine-grained semantic annotation and spectral analysis, and to map these representations to eligibility criteria for the trials with semi-supervised machine learning approaches.

Dates: January 2016 - December 2018
Contacts: Vincent Claveau


NexGen-TV aims at developing a generic solution for the enrichment, the linking and the retrieval of video content targeting the cost-cutting edition of second screen and multiscreen applications for broadcast TV. The main outcome of the project will be  a software platform to aggregate and distribute video content via a second-screen edition interface connected to social media. The curation interface will primarily make use of multimedia and social media content segmentation, description, linking and retrieval. Multiscreen applications will be developed on various domaine, e.g., sports, news.

Dates: June 2015 - December 2018
Contacts: Vincent Claveau, Guillaume Gravier

ANR IDFraud- An Operational Automatic Framework for Identity Document Fraud Detection and Profilin

The first contribution of IDFRAud project consists in proposing an automatic solution for ID analysis and integrity verification. Our ID analysis goes through three processes: classification, text extraction and ID verification. The three processes rely on a set of rules that are externalized in formal manner in order to allow easy management and evolving capabilities. This leads us to the second contribution of IDFRAud: an ID knowledge management module. The third objective of IDFRAud project is to address the forensic link detection problem and to propose an automatic analysis engine that can be continuously applied on the detected fraud ID database. Cluster analysis methods are used to discover relations between false IDs in their multidimensional feature space. This pattern extraction module will be coupled with a suitable visualization mechanism in order to facilitate the comprehension and the analysis of extracted groups of inter-linked fraud cases.

Dates: February 2015 - January 2018
Contact: Teddy Furon

Labex CominLabs LIMAH – Linking Media in Acceptable Hypergraphs

Available multimedia content is rapidly increasing in scale and diversity, yet today, multimedia data remain mostly unconnected, i.e., with no explicit links between related fragments. LIMAH aims at exploring hypergraph structures for multimedia collections, instantiating actual links between fragments of multimedia documents, where links reflect particular content-based proximity—similar content, thematic proximity, opinion expressed, answer to a question, etc.  Relying on a pluridiscipliary consortium (ICT, law, information and communication science  as well as cognitive and ergonomy psychology), LIMAH studies linked media both from a technological point of view and from the user perspective.

Dates: April 2014 - August 2018
Contacts: Guillaume Gravier, Pascale Sébillot


Past projects in the team

Permanent link to this article: