AKTors.org
AKTors.org AKTTechnologiesPublicationsRelated ProjectsPeople
AKTors.org AcquisitionModellingRetrievalReusePublishingMaintenance
Armadillo from The University of Sheffield

 

Armadillo: The information overload we experience from Internet is partly due to vast quantities of redundant information. Redundancy is apparent in the presence of multiple citations of the same facts in superficially different formats. This redundancy can be exploited to bootstrap an annotation process needed for Information Extraction, thus enabling production of machine-readable content for the Semantic Web. For example, the fact that a system knows the name of an author can be used to identify a number of other author names using resources present on the Internet, instead of using rule-based or statistical applications, or hand-built gazetteers. By combining a multiplicity of information sources, internal and external to the system, texts can be annotated with a high degree of accuracy with minimal or no manual intervention.


Armadillo fact-file

Owner

 : 

The University of Sheffield

Researchers
(listed alphabetically)

 : 

Mr Sam Chapman [ Browse , RDF ], Dr Fabio Ciravegna [ Browse , RDF ], Mr Alexiei Dingli [ Browse , RDF ].

Description

 : 

http://nlp.shef.ac.uk/armadillo

Used by

 : 

CS AKTiveSpace

Addresses challenges

 : 

Knowledge Acquisition , Knowledge Maintenance

What's the Problem?

  • Machine readable content is needed for the Semantic Web
  • Manual annotation, required to create machine readable content, is difficult, slow, time-consuming, tedious and costly
  • Information Extraction techniques help to automate the process but they require previously annotated documents to bootstrap the process
  • Even semi-automated techniques such as Melita , AKTive Doc or AKTive Media require considerable human input.

Towards a Solution

The information overload we experience from Internet is partly due to vast quantities of redundant information. Redundancy is apparent in the presence of multiple citations of the same facts in superficially different formats. This redundancy can be exploited to bootstrap the annotation process needed for Information Extraction, thus enabling production of machine-readable content for the Semantic Web. For example, the fact that a system knows the name of an author can be used to identify a number of other author names using resources present on the Internet, instead of using rule-based or statistical applications, or hand-built gazetteers. By combining a multiplicity of information sources, internal and external to the system, texts can be annotated with a high degree of accuracy with minimal or no manual intervention.

Armadillo utilizes multiple evidence from similarity (see SimMetric project), from source reliability and from Information Extraction capture certainty. Using these multiple strategies, Armadillo connects findings across the corpus. In so doing, Armadillo models the relevant domain and builds an RDF ontology and a knowledge base.

Further reading

Fabio Ciravegna , Sam Chapman , Alexiei Dingli and Yorick Wilks , Learning to Harvest Information for the Semantic Web, in Proceedings of the 1st European Semantic Web Symposium , Heraklion, Greece, May 10-12, 2004. [ PDF ].

Sam Chapman, Barry Norton, Fabio Ciravegna: Armadillo: Integrating Knowledge for the Semantic Web, Dagstuhl workshop on Learning for the Semantic Web, 3-18 February 2005, Dagstuhl , Germany .

Semantic representation

Also available in DOAP RDF ( Description Of A Project )