The main orientation of the project is fundamental research in the area of bioinformatics, using technology from text mining and automatic generation of ontologies. Nevertheless, the project has an important impact in the development of web based technologies, where various automatic methods for text processing can be applied for tasks such as knowledge management, business intelligence, knowledge reuse, etc. Moreover, the application to the test domain, the area of biology, contributes to the improvement to the huge amount of information in this area of knowledge.
The Main Goal of the project is to develop methods and techniques based on text mining for the semi-automatic creation and updating of ontologies for bioinformatic applications, as well as the procedures associated to the usage of these ontologies in information access applications.
The Specific Goals of the project are as follows:
1. Integration of tools for automatic text analysis, via a common platform (UIMA [UIMA]). This includes: morphologic analysis, syntactic (shallow and deep), and, if an ontology is available, also some kind of semantic analysis, although without any sense disambiguation.
2. Terminology acquisition from a corpus of scientific documents about biology (MEDLINE/PubMed) automatically tagged and taking into account standards as UMLS [UMLS]. Informative terms will be identified, so that they are used to characterize the content of documents, using terminology acquisition techniques.
3. Characterization of terms by their contexts of occurrence, both for automatically acquired terms and for preexisting ontologies. This characterization allows to represent terms as vectors, and then apply mathematical distances to compare them.
4. Induction of equivalence classes for terms, with increasing granularity levels, using hierarchical clustering techniques.
5. Induction and/or population of ontologies, applying automatic reasoning to context vectors representing equivalence classes obtained in the previous step.
6. Ontology merging using automatic reasoning on the characterization of terms by their contexts of occurrence.
7. Verification of consistency of ontologies using automatic reasoning.
8. Document tagging using ontology terms, applying classification techniques based in a notion of distance between the vector representing contexts within a document and vectors representing terms.
9. Evaluation, in an information retrieval application.