|ArtEquAkt from The University of Southampton|
ArtEquAKT: The Artequakt project aims to implement a system that searches the Web and extracts knowledge about artists, automatically producing tailored biographies of artists.
What's the Problem?
The need for automatic knowledge harvesting tools is quickly increasing as the amount of knowledge available and spread across the Web has never so large. Annota-tions on the Semantic Web could facilitate acquiring such knowledge, but annotations are rare and in the near future will probably not be rich or detailed enough to cover all the knowledge contained in these documents. Hence advanced knowledge services may require tools able to search and extract specific knowledge from the Web, guided by a domain conceptualisation (ontology) that details what type of knowledge to harvest.
The Artequakt system deals with three main problems:
Towards a Solution
The Artequakt project aims to implement a system that searches the Web and
extracts knowledge about artists, based on an ontology describing that domain,
and stores it in a knowledge base to be used for automatically producing tailored
biographies of artists.
The first stage of this project consisted of developing an ontology for the
domain of artists and paintings. The main part of this ontology was constructed
from selected sections in the CIDOC Conceptual Reference Model (CRM
) ontology. The ontology informs the extraction tool of the type of knowlegde
to search for and extract. An information extraction tool was developed and
applied that automatically populates the ontology with information extracts
from online documents. The information extraction tool makes use of an ontology,
coupled with a general-purpose lexical database, WordNet
and an entity-recogniser, GATE
as guidance tools for identifying knowledge fragments consisting not just of
entities, but also the relationships between them. Automatic term expansion
is used to increase the scope of text analysis to cover syntactic patterns that
imprecisely match our definitions.
The extracted information is stored in a knowledge base and analysed for duplications and inconsistencies. A variety of heuristics and knowledge comparison and term expansion methods were used for this purpose. This included the use of simple geographical relations from WordNet to consolidate any place information; e.g. places of birth or death. Temporal information were also consolidated with respect to precision and consistency.
Narrative construction tools were developed to query the knowledge base through an ontology server to search and retrieve relevant facts or textual paragraphs and generate a specific biography. The automatic generation of tailored biographies is concerned with two areas of focus. Firstly, providing biographies for artists where there is sparse information available, distributed across the Web. This may mean constructing text from basic factual information gleaned, or combining text from a number of sources with differing interests in the artist. Secondly, the project aims to provide biographies that are tailored to the particular interests and requirements of a given reader. These might range from rough stereotyping such as "A biography suitable for a child" to specific reader interests such as "I'm interested in the artists' use of colour in their oil paintings".
Take a Guided Tour
Here is a short video describing the Artequakt system. It's 1.23 minutes long with sound.
Harith Alani, Sanghee Kim, David E. Millard, Mark J. Weal, Wendy Hall, Paul
H. Lewis, Nigel R. Shadbolt (2003) Automatic Ontology-Based Knowledge Extraction
from Web Documents. IEEE Intelligent Systems. January/February, 18(1), pp. 14-21
Sanghee Kim, Harith Alani, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, and Mark Weal (2002). Artequakt: Generating Tailored Biographies with Automatically Annotated Fragments from the Web. In Proceedings Semantic Authoring, Annotation and Knowledge Markup Workshop in the 15th European Conference on Artificial Intelligence, Lyon, France.