AKTors.org
AKTors.org AKTTechnologiesPublicationsRelated ProjectsPeople
AKTors.org AcquisitionModellingRetrievalReusePublishingMaintenance
Abraxas from The University of Sheffield

 

Abraxas: a tri-partite, dynamic and iterative approach to automatic ontology learning.


Abraxas fact-file

Owner  :  The University of Sheffield
Researchers
(listed alphabetically)
 :  Christoper Brewster [Browse, RDF], Jose Iria [Browse, RDF], Ziqi Zhang [Browse, RDF]
Description  :  http://nlp.shef.ac.uk/abraxas/
Builds on  :  Resource Description Framework, JAVA, Jena
Addresses challenges  :  Knowledge Acquisition

What's the Problem?

  • Semantic Web depends on the creation of a large number of domain specific ontologies.
  • Most actual or potential users of the Semantic Web are not experts in ontology construction.
  • Manual ontology is difficult, slow, time-consuming, tedious and costly.
  • Existing ontology construction methodologies require high quality of input corpus and follow a single piple-line which takes one or more specific inputs and a single static output, this does not reflect the fact that knowledge is uncertain, continuously changing over time and differs from person to person.

Towards a Solution

Abraxas is an incremental, weakly-supervised approach to Ontology Learning(OL) which views OL as a process involving three resources: the corpus of texts, the extraction patterns set (conceived as a set of lexico-syntactic textual patterns), and the ontology (conceived as a set of RDF triples). The goal is to extend existing resources in terms of one another, always seeking a consistent overall state which we will name equilibrium. Our method allows equally creating an ontology given an input corpus, extending a corpus given an input ontology or deriving a set of extraction patterns given an input ontology and an input corpus.

The initial input to the process could be any of ontology, corpus, patterns or combinations of them. The input serves both as a specification of the task to perform and as seed data for a bootstrapping cycle where, at each iteration, a decision is made on which new candidate concept, relation, pattern or document to add to the domain. Such a decision is modelled via three unsupervised classification tasks that capture the interdependence between the resources: one classifies the suitability of a pattern to extract ontological concepts and relations in the documents; another classifies the suitability of ontological concepts and relations to generate patterns from the documents; and another classifies the suitability of a document to give support to patterns and ontological concepts. The notion of "suitability" is formalised by assigning the relationship of any resource to the domain a confidence value, which we call "resource confidence"(RC).

Bootstrapping starts with the user providing some seed data. Initial processing includes applying the extraction patterns to the seed corpus to extract any available knowledge triples, and learning new extraction patterns. The knowledge resources extracted by the initial processing are scored using the RC metric, and placed in the three resource queues. The queues contain candidate resources, sorted based on their RC in descending order, to be processed in following iterations. In each iteration the bootstrapping process polls the queues, and adds one resource to the system state at a time. Different measures are used to determine which type of resources to be polled. Following the addition of the resource, a new learning iteration is triggered. If a knowledge triple has been added, the system applies the triple in extraction patterns and queries the WWW to download more texts that cover this triple; if an extraction pattern has been added, the system applies the pattern over the corpus to discover new knowledge triples; if a text has been added, the system extracts new knowledge triples and extraction patterns from the text. Then these new knowledge resources are scored, and fed into candidate queues. The system then continues cycling through the stages described above, and iterates until stopping criteria - e.g., state reaches equilibrium - are met.

Semantic representation

View in the AKT Triplestore Browser or as RDF.

Also available in DOAP RDF (Description Of A Project)