AKTors.org
AKTors.org AKTTechnologiesPublicationsRelated ProjectsPeople
AKTors.org AcquisitionModellingRetrievalReusePublishingMaintenance
GATE - General Architecture for Text Engineering from The University of Sheffield


GATE - General Architecture for Text Engineering fact-file

Owner  :  The University of Sheffield
Researchers
(listed alphabetically)
 :  Dr Kalina Bontcheva [Browse, RDF], Dr Hamish Cunningham [Browse, RDF], Dr Diana Maynard [Browse, RDF], Mr Valentin Tablan [Browse, RDF]
Description  :  http://gate.ac.uk/
Demonstration  :  http://gate.ac.uk/annie/index.jsp
Screencam  :  http://gate.ac.uk/demos/corpus.avi
Builds on  :  eXtensible Markup Language, Java, Hyper Text Markup Language, PostgreSQL, Oracle
Used by  :  ArtEquAkt, Amilcare, ANNIE - Open Source Information Extraction, NMARKUP
Addresses challenges  :  Knowledge Retrieval, Knowledge Acquisition

What's the Problem?

Gartner reported recently that for at least the next decade more than 95% of human-to-computer information input will involve textual language. They also report that by 2012 taxonomic and hierachical knowledge mapping and indexing will be prevalent in almost all information-rich applications. The web revolution has been based largely on human language materials, and in making the shift to the next generation, knowledge-based web, human language will remain key. This development has posed new challenges and demonstrated the need for practical, robust, and reusable language technology.

Software Architecture for LE (SALE) is the computational infrastructure underlying language technologies and the scientific experiments that lead to them. SALE is a key enabler for computation with human language to penetrate many areas of our information-based society, such as knowledge technologies, Semantic Web, e-science, digital libraries, and cultural heritage.

The need for SALE comes from the fact that each language processing experiment or application has to deal with low-level tasks such as data storage, data visualisation, locating and loading of resources, and execution of processes, in addition to the data structures and algorithms specific to the work at hand. Infrastructure removes the overhead of these low-level tasks, and also promotes predictability of performance by providing automatic measurement tools to run on test cases that exemplify as closely as possible the eventual execution environment of the software. Another important role of SALE is to reduce integration overheads by providing standard mechanisms for LE components to communicate data about language, and by using open standards such as Java and XML as the underlying platform. Last but not least, SALE infrastructures can provide integrated LE components and the tools necessary for their customisation and extension.

Towards a Solution

GATE is open source Java software under the GNU library licence, and is a stable, robust, and scalable infrastructure which allows users to build and customise language processing components, while mundane tasks like data storage, format analysis and data visualisation are handled by GATE. The system is bundled with components for language analysis, and is in use for Information Extraction (IE), Information Retrieval (IR), Natural Language Generation, summarisation, dialogue, Semantic Web, Knowledge Technologies and Digital Libraries applications. GATE-based systems have taken part in the all the major quantitative evaluation programmes for Natural Language Processing since 1995. It has been downloaded by thousands of sites worldwide with active users from universities and companies alike, e.g., UCL, UMIST, Karlsruhe, Vassar College, Perseus Digital Library, Tufts University, Pearson Education PLC, UK, Merck KgAa, Canon Europe, BBN Technologies, Knight Ridder.

GATE was specifically cited for its wide usage by a team of international reviewers that produced a document for the EPSRC, IEE and BCS http://www.iee.org/Policy/CSreport/ and was the only mention of Sheffield Computer Science in that document. GATE was used by DARPA in 1998 as the platform for its major end-of-project (TIPSTER) demonstration due to its superiority over the contracted US software.

GATE is currently used by a large number of research projects around the world for:

  • corpus annotation and processing: American National Corpus at Vassar College, US.; PERSEUS Digital Library at Tufts University, US.; EMILLE - corpus of South Asian languages, at University of Lancaster, University of Sheffield.
  • medical applications: Parallel IE for bio-medical text mining at Merck kGaA, Darmstadt.; Medline Analysis at Institute for Medical Informatics and Biometry, University of Rostock, Germany.
  • e-science projects: Multiflora II - biodiversity, at University of Manchester, University of Sheffield.; MiAKT - medical informatics, at University of Southampton, University of Sheffield, Open University, Oxford University, Guy's Hospital, King's College London.; CLEF - clinical e-science framework, at University of Manchester, University of Cambridge, University of Sheffield, University College London, Royal Marsden NHS Trust, University of Brighton.
  • knowledge technologies: AKT at University of Aberdeen, University of Edinburgh, Open University, University of Sheffield, University of Southampton.
  • GRID applications: MyGRID at University of Manchester, University of Newcastle, University of Nottingham, University of Sheffield, University of Southampton, IT Innovation Centre, European Bioinformatics Institute.
  • a development environment and reusable LE tools for developing new language processing components: Document Summarisation at Imperial College, London.; AMITIES - dialogue processing, at University of Sheffield, CNRS-LIMSI, GE Service Centre GMBH, VECSYS, VIEL and CIE, State University of New York DUke UNiversity, GE Research and Development.

There are many other users of GATE not reported here; some more details may be found at http://gate.ac.uk/news.html.

The picture below shows GATE's development environment with its Information Extraction components loaded and their results demonstrated on the document shown.

Take a Guided Tour

Introduction to GATE (movie)

Try a Demonstration

Online demonstration of GATE's IE system

Download GATE

Semantic indexing of multimedia material

  • Automatic extraction of health and safety information from company reports (Health and Safety Executive/Sheffield University)
  • Extraction of commodity events from news (Master Foods NV)
  • Automatic annotation and ontology population for the Semantic Web (Ontotext, Sirma AI Ltd.)
  • Further Reading

    Key document:

    H. Cunningham, D. Maynard, K. Bontcheva, V. Tablan. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL'02). Philadelphia, July 2002.

    Other relevant documents:

    Tutorial, Publications list

    Semantic representation

    View in the AKT Triplestore Browser or as RDF.

    Also available in DOAP RDF (Description Of A Project)