 | |

Amilcare:
An Information Extraction tool designed to support automatic
document annotation for the Semantic Web
Amilcare fact-file
What's the Problem?
- Machine readable content is needed for the Semantic Web
- Most actual or potential users of the Semantic Web are not
experts in document annotation
- Manual annotation is difficult, slow, time-consuming, tedious
and costly.
- Existing Information Extraction approaches are either unable to
cope with extralinguistic structure (tags, etc.) or ineffective on
free text.
Towards a Solution
Amilcare is an adaptive Information
Extraction tool designed to support document annotation for the
Semantic Web. It is designed to be used by a wide range of users from
naive users to Information Extraction experts. It can handle a range
of text document types (e.g. free texts, HTML documents, XML
documents, tables etc.) including documents which contain a mix of
these features and it can be inserted into a wider annotation
environment.
Amilcare uses machine learning to adapt to new application domains,
learning a set of rules. Rules are learnt by generalising over a set
of examples in a training corpus annotatates with XML tags. Amilcare
works in three modes:
- training mode, to induce rules
- testing mode, to test the induced rules on an unseen
tagged corpus
- production mode, which annotates any document the system
is provided with using the rules previously induce
Amilcare's default architecture includes the connection with
Annie,
Gate's
shallow IE system which performs tokenization, part of speech tagging,
gazetteer lookup and named entity recognition. Any other preprocessor
can be connected via the API. The preprocessor is also the only
language-dependent module, the rest of the system being language
independent.
Some example screen shots are provided here of the user interface.
Amilcare can however be used without a user interface, simply as an
API.
Naive User Interface

Expert User Interface

Take a Guided Tour
- A General Introduction video, in ShockWave
Flash (0.3 Mb)
- A Detailed Tutorial video, in ShockWave
Flash (1.4 Mb).
Try a Demonstration
Please contact Fabio
Ciravegna to obtain access.
Technical requirements: Windows 2000, XP, Java Runtime
Environment 1.3, 512 Mb RAM, 800 MHz Processor
Example Applications
Amilcare is currently integrated in the following Semantic Web
annotation tools:
| MnM
|
developed at the Open University, UK |
|
Ontomat |
developed at the University of Karlsruhe |
| Melita
|
developed at the University of Sheffield |
and used by the following entities:
Further Reading
Key document: Fabio Ciravegna, Designing Adaptive
Information Extraction for the Semantic Web in Amilcare, to appear
in S. Handschuh and S. Staab (eds), "Annotation for the Semantic Web"
to appear in the Series "Frontiers in Artifical Intelligence and
Applications" by IOS Press, Amsterdam, 2003.
Other relevant documents
Fabio Ciravegna ,
Alexiei Dingli, Daniela Petrelli and Yorick Wilks: "User-System
Cooperation in Document Annotation based on Information Extraction
" in Asuncion Gomez-Perez, V. Richard Benjamins (eds.): "Knowledge
Engineering and Knowledge Management (Ontologies and the Semantic
Web)", Proceedings of the 13th International Conference on Knowledge
Engineering and Knowledge Management (EKAW02), 1-4 October 2002 -
Sigüenza (Spain), Lecture Notes in Artificial Intelligence 2473,
Springer Verlag . Available in the eprints archive.
Fabio Ciravegna,
Alexiei Dingli, Daniela Petrelli and Yorick Wilks: "Document
Annotation via Adaptive Information Extraction" Poster at the
25th Annual International ACM SIGIR Conference on Research and
Development in Information Retrieval August 11-15, 2002, in Tampere,
Finland.
Fabio Ciravegna
: "Adaptive
Information Extraction from Text by Rule Induction and
Generalisation" in Proceedings of 17th International Joint
Conference on Artificial Intelligence (IJCAI 2001) , Seattle,
August 2001. Available in the eprints archive.
Fabio Ciravegna
: "(LP)
2, an Adaptive Algorithm for Information Extraction from
Web-related Texts" in Proceedings of the IJCAI-2001
Workshop on Adaptive Text Extraction and Mining , held in
conjunction with the 17th International Conference on Artificial
Intelligence (IJCAI-01), Seattle, August, 2001 Available in the
eprints archive.
Fabio Ciravegna
and Daniela Petrelli: "User
Involvement in Adaptive Information Extraction: Position Paper"
in Proceedings of
the IJCAI-2001 Workshop on Adaptive Text Extraction and Mining ,
held in conjunction with the 17th International Conference on
Artificial Intelligence (IJCAI-01), Seattle, August, 2001
Available in the eprints archive.
Enrico Motta, Maria Vargas-Vera, John Domingue, Mattia Lanzoni ,
Arthur Stutt and Fabio Ciravegna: "MnM: Ontology Driven
Semi-Automatic and Automatic Support for Semantic Markup" in
Asuncion Gomez-Perez, V. Richard Benjamins (eds.): "Knowledge
Engineering and Knowledge Management (Ontologies and the Semantic
Web)", Proceedings of the 13th International Conference on Knowledge
Engineering and Knowledge Management (EKAW02), 1-4 October 2002 -
Sigüenza (Spain), Lecture Notes in Artificial Intelligence 2473,
Springer Verlag
Siegfried Handschuh, Steffen Staab and Fabio Ciravegna:
"
S-CREAM --- Semi-automatic CREAtion of Metadata " in Asuncion
Gomez-Perez, V. Richard Benjamins (eds.): "Knowledge Engineering and
Knowledge Management (Ontologies and the Semantic Web)", Proceedings
of the 13th International Conference on Knowledge Engineering and
Knowledge Management (EKAW02), 1-4 October 2002 - Sigüenza
(Spain), Lecture Notes in Artificial Intelligence 2473, Springer
Verlag Available in the eprints archive.
Semantic representation
View in the AKT Triplestore Browser or as
RDF.
Also available in DOAP RDF (Description Of A Project) |