AKTors.org
AKTors.org AKTTechnologiesPublicationsRelated ProjectsPeople
AKTors.org AcquisitionModellingRetrievalReusePublishingMaintenance
T-Rex from The University of Sheffield

 

T-Rex: Trainable Relation Extraction Framework


T-Rex fact-file

Owner  :  The University of Sheffield
Researchers  :  Mr José Iria
Description  :  http://tyne.shef.ac.uk/t-rex/
Builds on  :  Jena (RDF framework) SVMLight (SVM machine learning algorithm implementation) Fastutil (Primitive-type data structures implementation)
Used by  :  AKTiveMedia X-Search
Addresses challenges  :  Knowledge Retrieval

What's the Problem?

In the wake of the explosive growth in the use of the computer as a communication device, has come a need for systems that help people cope with the sheer volume of information available. It is universally known that the Internet contains vast amounts of unstructured documents, but the same is also true for large organizations like publishing companies, government departments, airplane manufacturers, car manufacturers, and so forth. In many application domains, there is the potential to significantly increase the utility of available textual information by using automated methods for mapping parts of the unstructured text into a structured representation. This process is called Information Extraction (IE). Within IE, the task of Entity Extraction is essentially a classification problem: given a piece of text in a document, the task consists in deciding whether it fits into some entity class. The task of Relation Extraction (REX), also known as event extraction or template filling, additionally aims to establish relations between the classified entities. The top performer in the 2002 DARPA ACE evaluation got entity extraction precision and recall scores of about 80%, but binary relation extraction scores of only roughly 60%. Using a system that makes nearly one mistake out of two suggestions is hardly acceptable in real-world applications. Relation extraction is therefore a difficult open research problem, with important applications in diverse fields, such as Knowledge Management and Web Mining.

Towards a Solution

The Trainable Relation Extraction framework has been developed as a testbed for experimenting with several algorithms for relation extraction. The framework promotes the adoption of a divide and conquer approach, by delimiting subproblems that can be worked upon separately in order to improve the overall system. The framework is general enough to support a variety of IE algorithms. As a first test, an entity extraction algorithm based on support vector machines was quickly developed using the framework. The algorithm achieves state-of-the-art results on typical corpora for the task.