AKTors.org
AKTors.org AKTTechnologiesPublicationsRelated ProjectsPeople
AKTors.org AcquisitionModellingRetrievalReusePublishingMaintenance
Amilcare from The University of Sheffield

Amilcare: An Information Extraction tool designed to support automatic document annotation for the Semantic Web


Amilcare fact-file

Owner  :  The University of Sheffield
Researchers
(listed alphabetically)
 :  Dr Fabio Ciravegna [Browse, RDF]
Description  :  http://nlp.shef.ac.uk/amilcare/
Screencam  :  http://www.aktors.org/technologies/amilcare/amilcareshort1.html
Builds on  :  GATE - General Architecture for Text Engineering, eXtensible Markup Language, Java, Hyper Text Markup Language
Used by  :  Melita
Addresses challenges  :  Knowledge Acquisition

What's the Problem?

  • Machine readable content is needed for the Semantic Web
  • Most actual or potential users of the Semantic Web are not experts in document annotation
  • Manual annotation is difficult, slow, time-consuming, tedious and costly.
  • Existing Information Extraction approaches are either unable to cope with extralinguistic structure (tags, etc.) or ineffective on free text.

Towards a Solution

Amilcare is an adaptive Information Extraction tool designed to support document annotation for the Semantic Web. It is designed to be used by a wide range of users from naive users to Information Extraction experts. It can handle a range of text document types (e.g. free texts, HTML documents, XML documents, tables etc.) including documents which contain a mix of these features and it can be inserted into a wider annotation environment.

Amilcare uses machine learning to adapt to new application domains, learning a set of rules. Rules are learnt by generalising over a set of examples in a training corpus annotatates with XML tags. Amilcare works in three modes:

  • training mode, to induce rules
  • testing mode, to test the induced rules on an unseen tagged corpus
  • production mode, which annotates any document the system is provided with using the rules previously induce

Amilcare's default architecture includes the connection with Annie, Gate's shallow IE system which performs tokenization, part of speech tagging, gazetteer lookup and named entity recognition. Any other preprocessor can be connected via the API. The preprocessor is also the only language-dependent module, the rest of the system being language independent.

Some example screen shots are provided here of the user interface. Amilcare can however be used without a user interface, simply as an API.

Naive User Interface

Naive User Interface

Expert User Interface

Expert User Interface

Take a Guided Tour

  • A General Introduction video, in ShockWave Flash  (0.3 Mb)
  • A Detailed Tutorial video, in ShockWave Flash (1.4 Mb).

Try a Demonstration

Please contact Fabio Ciravegna to obtain access.

Technical requirements: Windows 2000, XP, Java Runtime Environment 1.3, 512 Mb RAM, 800 MHz Processor

Example Applications

Amilcare is currently integrated in the following Semantic Web annotation tools:

MnM developed at the Open University, UK
Ontomat developed at the University of Karlsruhe
Melita developed at the University of Sheffield

and used by the following entities:

Merck (D) ISOCO (SP)
Quinary (I) Ontoprise (D)
University College Dublin (IE) CNRS (F)

Further Reading

Key document: Fabio Ciravegna, Designing Adaptive Information Extraction for the Semantic Web in Amilcare, to appear in S. Handschuh and S. Staab (eds), "Annotation for the Semantic Web" to appear in the Series "Frontiers in Artifical Intelligence and Applications" by IOS Press, Amsterdam, 2003.

Other relevant documents

Fabio Ciravegna , Alexiei Dingli, Daniela Petrelli and Yorick Wilks:
"User-System Cooperation in Document Annotation based on Information Extraction "
in Asuncion Gomez-Perez, V. Richard Benjamins (eds.): "Knowledge Engineering and Knowledge Management (Ontologies and the Semantic Web)", Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management (EKAW02), 1-4 October 2002 - Sigüenza (Spain), Lecture Notes in Artificial Intelligence 2473, Springer Verlag .
Available in the eprints archive.

Fabio Ciravegna, Alexiei Dingli, Daniela Petrelli and Yorick Wilks:
"Document Annotation via Adaptive Information Extraction"
Poster at the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval August 11-15, 2002, in Tampere, Finland.

Fabio Ciravegna :
"Adaptive Information Extraction from Text by Rule Induction and Generalisation"
in Proceedings of 17th International Joint Conference on Artificial Intelligence (IJCAI 2001) , Seattle, August 2001.
Available in the eprints archive.

Fabio Ciravegna :
"(LP) 2, an Adaptive Algorithm for Information Extraction from Web-related Texts"
in Proceedings of the IJCAI-2001 Workshop on Adaptive Text Extraction and Mining , held in conjunction with the 17th International Conference on Artificial Intelligence (IJCAI-01), Seattle, August, 2001
Available in the eprints archive.

Fabio Ciravegna and Daniela Petrelli:
"User Involvement in Adaptive Information Extraction: Position Paper"
in Proceedings of the IJCAI-2001 Workshop on Adaptive Text Extraction and Mining , held in conjunction with the 17th International Conference on Artificial Intelligence (IJCAI-01), Seattle, August, 2001
Available in the eprints archive.

Enrico Motta, Maria Vargas-Vera, John Domingue, Mattia Lanzoni , Arthur Stutt and Fabio Ciravegna:
"MnM: Ontology Driven Semi-Automatic and Automatic Support for Semantic Markup"
in Asuncion Gomez-Perez, V. Richard Benjamins (eds.): "Knowledge Engineering and Knowledge Management (Ontologies and the Semantic Web)", Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management (EKAW02), 1-4 October 2002 - Sigüenza (Spain), Lecture Notes in Artificial Intelligence 2473, Springer Verlag

Siegfried Handschuh, Steffen Staab and Fabio Ciravegna:
" S-CREAM --- Semi-automatic CREAtion of Metadata "
in Asuncion Gomez-Perez, V. Richard Benjamins (eds.): "Knowledge Engineering and Knowledge Management (Ontologies and the Semantic Web)", Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management (EKAW02), 1-4 October 2002 - Sigüenza (Spain), Lecture Notes in Artificial Intelligence 2473, Springer Verlag
Available in the eprints archive.

Semantic representation

View in the AKT Triplestore Browser or as RDF.

Also available in DOAP RDF (Description Of A Project)