Labex CominLabs BigCLIN

Data collected or produced during clinical care process can be exploited at different levels and across different domains. Yet, a well-known challenge for secondary use of health big data is that much of detailed patient information is embedded in narrative text, mostly stored as unstructured data. The project proposes to address the essential needs when reusing unstructured clinical data at a large scale. We propose to develop new clinical records representation relying on fine-grained semantic annotation thanks to new NLP tools dedicated to French clinical narratives. To efficiently map this added semantic information to existing structured data for further analysis at big scale, the project also addresses distributed systems issues: scalability, management of uncertain data and privacy, stream processing at runtime, etc.

Dates: September 2016 - August 2019
Contacts: Vincent Claveau

CNRS – CONFAP FIGTEM – Fine-grained text-mining for clinical trials

FIGTEM aims at developing natural language processing methods, including information extraction and indexing, dedicated to the clinical trial domain. The goal is to populate a formal representation of patients (via their Electronic Patient Records) and clinical trial data in different languages. These methods will be used within a recruitment support system for the purpose of clinical and therapeutic trials. To this end, we propose to develop text representation relying on fine-grained semantic annotation and spectral analysis, and to map these representations to eligibility criteria for the trials with semi-supervised machine learning approaches.

Dates: January 2016 - December 2018
Contacts: Vincent Claveau


NexGen-TV aims at developing a generic solution for the enrichment, the linking and the retrieval of video content targeting the cost-cutting edition of second screen and multiscreen applications for broadcast TV. The main outcome of the project will be  a software platform to aggregate and distribute video content via a second-screen edition interface connected to social media. The curation interface will primarily make use of multimedia and social media content segmentation, description, linking and retrieval. Multiscreen applications will be developed on various domaine, e.g., sports, news.

Dates: June 2015 - December 2018
Contacts: Vincent Claveau, Guillaume Gravier

ANR IDFraud- An Operational Automatic Framework for Identity Document Fraud Detection and Profilin

The first contribution of IDFRAud project consists in proposing an automatic solution for ID analysis and integrity verification. Our ID analysis goes through three processes: classification, text extraction and ID verification. The three processes rely on a set of rules that are externalized in formal manner in order to allow easy management and evolving capabilities. This leads us to the second contribution of IDFRAud: an ID knowledge management module. The third objective of IDFRAud project is to address the forensic link detection problem and to propose an automatic analysis engine that can be continuously applied on the detected fraud ID database. Cluster analysis methods are used to discover relations between false IDs in their multidimensional feature space. This pattern extraction module will be coupled with a suitable visualization mechanism in order to facilitate the comprehension and the analysis of extracted groups of inter-linked fraud cases.

Dates: February 2015 - January 2018
Contact: Teddy Furon

Labex CominLabs LIMAH – Linking Media in Acceptable Hypergraphs

Available multimedia content is rapidly increasing in scale and diversity, yet today, multimedia data remain mostly unconnected, i.e., with no explicit links between related fragments. LIMAH aims at exploring hypergraph structures for multimedia collections, instantiating actual links between fragments of multimedia documents, where links reflect particular content-based proximity—similar content, thematic proximity, opinion expressed, answer to a question, etc.  Relying on a pluridiscipliary consortium (ICT, law, information and communication science  as well as cognitive and ergonomy psychology), LIMAH studies linked media both from a technological point of view and from the user perspective.

Dates: April 2014 - August 2018
Contacts: Guillaume Gravier, Pascale Sébillot

Inria Project Lab iCODA – Knowledge-mediated Content and Data Analytics

One of today’s major issues in data science is the design of algorithms that allow analysts to effi- ciently infer useful information and knowledge by collaboratively inspecting heterogeneous informa- tion sources, from structured data to unstructured content. Taking data journalism as an emblematic use-case, the goal of the project is to develop the scientific and technological foundations for knowledge- mediated user-in-the-loop collaborative data analytics on heterogenous information sources, and to demonstrate the effectiveness of the approach in realistic, high-visibility use-cases. The project stands at the crossroad of multiple research fields—content analysis, data management, knowledge represen- tation, visualization—that span multiple Inria themes, and counts on a club of major press partners to define usage scenarios, provide data and demonstrate achievements.

Dates: May 2017 - Dec. 2020 
Contacts: Guillaume Gravier, Laurent Amsaleg

Past projects in the team

Permanent link to this article:

How to detect automaticaly hoaxes ?

With the expansion of social networks, many false or uncertain information (fake) or unverified rumors (hoaxes) are propagated by users. They occur most often by a picture with explanations. The societal impact of these messages is variable (joke, need to fill a lack of image or information about an event) but many within the handling …