AKTors.org
AKTors.org AKTTechnologiesPublicationsRelated ProjectsPeople
AKTors.org AcquisitionModellingRetrievalReusePublishingMaintenance
ArtEquAkt from The University of Southampton

ArtEquAKT: The Artequakt project aims to implement a system that searches the Web and extracts knowledge about artists, automatically producing tailored biographies of artists.


ArtEquAkt fact-file

Owner  :  The University of Southampton
Researchers
(listed alphabetically)
 :  Dr Paul Lewis [Browse, RDF]
Description  :  http://www.artequakt.ecs.soton.ac.uk/
Demonstration  :  http://www.artequakt.ecs.soton.ac.uk/demo/
Screencam  :  http://www.aktors.org/technologies/artequakt/artequakt_demo.mov
Builds on  :  GATE - General Architecture for Text Engineering, Protege-2000, Auld Linky, Java
Addresses challenges  :  Knowledge Retrieval, Knowledge Publishing

What's the Problem?

The need for automatic knowledge harvesting tools is quickly increasing as the amount of knowledge available and spread across the Web has never so large. Annota-tions on the Semantic Web could facilitate acquiring such knowledge, but annotations are rare and in the near future will probably not be rich or detailed enough to cover all the knowledge contained in these documents. Hence advanced knowledge services may require tools able to search and extract specific knowledge from the Web, guided by a domain conceptualisation (ontology) that details what type of knowledge to harvest.

The Artequakt system deals with three main problems:

  • Many Information Extraction (IE) systems rely on predefined templates and pattern-based extraction rules or machine learning techniques in order to identify and extract entities within text documents. Ontologies can provide domain knowledge in the form of concepts and relationships. Linking ontologies to IE systems could provide richer knowledge guidance about what information to extract, the types of relationships to look for, and how to present the extracted information.
  • There exist many IE systems that enable the recognition of entities within documents (e.g. 'Renoir' is a 'Person', '25 Feb 1841' is a 'Date'). However, such information is incomplete and of little value without acquiring the rela-tion between these entities (e.g. 'Renoir' was born on '25 Feb 1841'). Extracting such relations automatically is difficult, but crucial to complete the acquisition of knowledge fragments and ontology population (building the knowledge base).
  • When analysing documents and extracting information, it is inevitable that duplicated and contradictory information will be extracted. Handling such information is challenging for automatic extraction and ontology population approaches

Towards a Solution

The Artequakt project aims to implement a system that searches the Web and extracts knowledge about artists, based on an ontology describing that domain, and stores it in a knowledge base to be used for automatically producing tailored biographies of artists.
The figure above illustrates Artequakt's architecture which comprises of three key areas. The first concerns the knowledge extraction tools used to extract factual information items from documents and pass them to the ontology server. The second key area is the information management and storage. The information is stored by the ontology server and consolidated into a knowledge base which can be queried via an inference engine. The final area is the narrative generation. The Artequakt server takes requests from a reader via a simple Web interface. The reader request will include an artist and the style of biography to be generated (chronology, summary, fact sheet, etc.). The server uses story templates to render a narrative from the information stored in the knowledge baseusing a combination of original text fragments and natural language generation.

The first stage of this project consisted of developing an ontology for the domain of artists and paintings. The main part of this ontology was constructed from selected sections in the CIDOC Conceptual Reference Model (CRM ) ontology. The ontology informs the extraction tool of the type of knowlegde to search for and extract. An information extraction tool was developed and applied that automatically populates the ontology with information extracts from online documents. The information extraction tool makes use of an ontology, coupled with a general-purpose lexical database, WordNet and an entity-recogniser, GATE as guidance tools for identifying knowledge fragments consisting not just of entities, but also the relationships between them. Automatic term expansion is used to increase the scope of text analysis to cover syntactic patterns that imprecisely match our definitions.

The extracted information is stored in a knowledge base and analysed for duplications and inconsistencies. A variety of heuristics and knowledge comparison and term expansion methods were used for this purpose. This included the use of simple geographical relations from WordNet to consolidate any place information; e.g. places of birth or death. Temporal information were also consolidated with respect to precision and consistency.

Narrative construction tools were developed to query the knowledge base through an ontology server to search and retrieve relevant facts or textual paragraphs and generate a specific biography. The automatic generation of tailored biographies is concerned with two areas of focus. Firstly, providing biographies for artists where there is sparse information available, distributed across the Web. This may mean constructing text from basic factual information gleaned, or combining text from a number of sources with differing interests in the artist. Secondly, the project aims to provide biographies that are tailored to the particular interests and requirements of a given reader. These might range from rough stereotyping such as "A biography suitable for a child" to specific reader interests such as "I'm interested in the artists' use of colour in their oil paintings".


Take a Guided Tour

Here is a short video describing the Artequakt system. It's 1.23 minutes long with sound.

Further Reading

Key document:

Harith Alani, Sanghee Kim, David E. Millard, Mark J. Weal, Wendy Hall, Paul H. Lewis, Nigel R. Shadbolt (2003) Automatic Ontology-Based Knowledge Extraction from Web Documents. IEEE Intelligent Systems. January/February, 18(1), pp. 14-21

Other documents:

Sanghee Kim, Harith Alani, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, and Mark Weal (2002). Artequakt: Generating Tailored Biographies with Automatically Annotated Fragments from the Web. In Proceedings Semantic Authoring, Annotation and Knowledge Markup Workshop in the 15th European Conference on Artificial Intelligence, Lyon, France.

Semantic representation

View in the AKT Triplestore Browser or as RDF.

Also available in DOAP RDF (Description Of A Project)