Enhanced Semantic Access to the Protein Engineering Literature using Ontologies Populated by Text Mining

René Witte, Thomas Kappler, and Christopher J. O. Baker. Enhanced Semantic Access to the Protein Engineering Literature using Ontologies Populated by Text Mining. In International Journal of Bioinformatics Research and Applications (IJBRA), Volume 3, Issue 3, 2007. DOI: 10.1504/IJBRA.2007.015009.


The biomedical literature is growing at an ever-increasing rate, which pronounces the need to support scientists with advanced, automated means of accessing knowledge. We investigate a novel approach employing description logics (DL)-based queries made to formal ontologies that have been created using the results of text mining full-text research papers. In this paradigm, an OWL-DL ontology becomes populated with instances detected through natural language processing (NLP). The generated ontology can be queried by biologists using DL reasoners or integrated into bioinformatics workflows for further automated analyses. We demonstrate the feasibility of this approach with a system targeting the protein mutation literature.

Ontology Design for Biomedical Text Mining

book coverRené Witte, Thomas Kappler, and Christopher J. O. Baker. Semantic Web: Revolutionizing Knowledge Discovery in the Life Sciences, chapter Ontology Design for Biomedical Text Mining. Springer, 2007. ISBN: 978-0-387-48436-5.


Text Mining in biology and biomedicine requires a large amount of domain-specific knowledge. Publicly accessible resources hold much of the information needed, yet their practical integration into natural language processing (NLP) systems is fraught with manifold hurdles, especially the problem of semantic disconnectedness throughout the various resources and components. Ontologies can provide the necessary framework for a consistent semantic integration, while additionally delivering formal reasoning capabilities to NLP.

In this chapter, we address four important aspects relating to the integration of ontology and NLP: (i) An analysis of the different integration alternatives and their respective vantages; (ii) The design requirements for an ontology supporting NLP tasks; (iii) Creation and initialization of an ontology using publicly available tools and databases; and (iv) The connection of common NLP tasks with an ontology, including technical aspects of ontology deployment in a text mining framework. A concrete application example—text mining of enzyme mutations—is provided to motivate and illustrate these points.

Einführung in die Computerlinguistik

An introduction to computational linguistics, written in German.

Published in an internal report of our institute at university: Text Mining: Wissensgewinnung aus natürlichsprachigen Dokumenten (Text Mining: knowledge extraction from natural language documents).


1.1   Einleitung
  1.1.1    Wissen über Sprache
  1.1.2    Geschichte der Computerlinguistik
1.2   Morphologie
  1.2.1    Stemming und Lemmatisierung, Porter Stemmer
1.3   Syntax: Wortarten und Konstituenten
  1.3.1    Wortarten und Wortartbestimmung
  1.3.2    Konstituenten
1.4   Syntax: Grammatiken und Sprachen
  1.4.1    Formale Grammatiken und die Chomsky-Hierarchie
  1.4.2    Reguläre Ausdrücke
  1.4.3    Syntaktisches Parsen
1.5   Semantik
  1.5.1    Ein klassischer Ansatz: Prädikatenlogik
  1.5.2    Prädikat-Argument-Strukturen
  1.5.3    Lexikalische Semantik
  1.5.4    WordNet
1.6   Zusammenfassung und Ausblick


  author = {Thomas Kappler},
  title = {{Einf{\"{u}}hrung in die Computerlinguistik}},
  chapter = {1},
  crossref = {tmrep},

  booktitle = {{Text Mining: Wissensgewinnung aus
                nat\"{u}rlichsprachigen Dokumenten}},
  year = {2006},
  editor = {Ren{\'{e}} Witte and Jutta M\"{u}lle},
  series = {Interner Bericht 2006-5},
  organization = {Universit\"{a}t Karlsruhe, Fakult\"{a}t f\"{u}r
                  Informatik, Institut f\"{u}r Programmstrukturen
                  und Datenorganisation (IPD)},
  note = {ISSN 1432-7864, URL: