MICROBIO-AMSUD
HomeProject Microbio

« May 2013 »
MonTueWedThuFriSatSun
 12345
6789101112
13141516171819
20212223242526
2728293031 

Print Version
Project Microbio

The project allowed us to develop and integrate various methods and techniques for the analysis of natural language and data mining to automatically extract ontologic information from electronic documents in natural language, as well as to populate already existing ontologies, enhancing their coverage, and eventually unify different ontologies. On the other hand, alternative techniques for the automatic tagging of documents will also be developed, using precisely that information provided by these ontologies. These two lines of work make it possible to semi-automatically enhance with meta-data the available knowledge in the Semantic Web. This will have an impact in information retrieval tasks, automatic reasoning or access to semantic data of web users through natural language based interfaces. The problem is especially relevant in bioinformatics if we consider that, with the dramatic increase of available documents, processing information escapes human capacities. In this context, automatic processing is no more a commodity, it is a necessity.

Microbio project, granted by Stic-Amsud, is a collaboration between :

  • Facultad de Matemática, Astronomía y Física. Universidad Nacional de Córdoba, Argentina;
  • INCO, Facultad de Ingeniería, Universidad de la República del Uruguay;
  • Institut Pasteur de Montevideo, Uruguay;
  • LORIA (Laboratoire Lorrain de Recherche en Informatique et ses Applications), Nancy, France;
  • MoDyCO (Modèles, Dynamiques, Corpus) UMR 7114 CNRS, Université Paris X, Nanterre, France;
  • Pontificia Universidade Católica do Rio Grande do Sul, Faculdade de Informatica, Brasil;
  • Universidad de Concepción, Chili;

Project coordinnator is Jean-Luc Minel (MoDyCo) and Delphine Battistelli (Université Paris Sorbonne)

The project ended the 31th december of 2009.

 


The main orientation of the project is fundamental research in the area of bioinformatics, using technology from text mining and automatic generation of ontologies. Nevertheless, the project has an important impact in the development of web based technologies, where various automatic methods for text processing can be applied for tasks such as knowledge management, business intelligence, knowledge reuse, etc. Moreover, the application to the test domain, the area of biology, contributes to the improvement to the huge amount of information in this area of knowledge.

The Main Goal of the project is to develop methods and techniques based on text mining for the semi-automatic creation and updating of ontologies for bioinformatic applications, as well as the procedures associated to the usage of these ontologies in information access applications.

The Specific Goals of the project are as follows:

1. Integration of tools for automatic text analysis, via a common platform (UIMA [UIMA]). This includes: morphologic analysis, syntactic (shallow and deep), and, if an ontology is available, also some kind of semantic analysis, although without any sense disambiguation.

2. Terminology acquisition from a corpus of scientific documents about biology (MEDLINE/PubMed) automatically tagged and taking into account standards as UMLS [UMLS]. Informative terms will be identified, so that they are used to characterize the content of documents, using terminology acquisition techniques.

3. Characterization of terms by their contexts of occurrence, both for automatically acquired terms and for preexisting ontologies. This characterization allows to represent terms as vectors, and then apply mathematical distances to compare them.

4. Induction of equivalence classes for terms, with increasing granularity levels, using hierarchical clustering techniques.

5. Induction and/or population of ontologies, applying automatic reasoning to context vectors representing equivalence classes obtained in the previous step.

6. Ontology merging using automatic reasoning on the characterization of terms by their contexts of occurrence.

7. Verification of consistency of ontologies using automatic reasoning.

8. Document tagging using ontology terms, applying classification techniques based in a notion of distance between the vector representing contexts within a document and vectors representing terms.

9. Evaluation, in an information retrieval application.

 

 
 
 

Powered by UNAK-CMS