|GATE - General Architecture for Text Engineering from The University of Sheffield|
GATE - General Architecture for Text Engineering fact-file
What's the Problem?
Gartner reported recently that for at least the next decade more than 95% of human-to-computer information input will involve textual language. They also report that by 2012 taxonomic and hierachical knowledge mapping and indexing will be prevalent in almost all information-rich applications. The web revolution has been based largely on human language materials, and in making the shift to the next generation, knowledge-based web, human language will remain key. This development has posed new challenges and demonstrated the need for practical, robust, and reusable language technology.
Software Architecture for LE (SALE) is the computational infrastructure underlying language technologies and the scientific experiments that lead to them. SALE is a key enabler for computation with human language to penetrate many areas of our information-based society, such as knowledge technologies, Semantic Web, e-science, digital libraries, and cultural heritage.
The need for SALE comes from the fact that each language processing experiment or application has to deal with low-level tasks such as data storage, data visualisation, locating and loading of resources, and execution of processes, in addition to the data structures and algorithms specific to the work at hand. Infrastructure removes the overhead of these low-level tasks, and also promotes predictability of performance by providing automatic measurement tools to run on test cases that exemplify as closely as possible the eventual execution environment of the software. Another important role of SALE is to reduce integration overheads by providing standard mechanisms for LE components to communicate data about language, and by using open standards such as Java and XML as the underlying platform. Last but not least, SALE infrastructures can provide integrated LE components and the tools necessary for their customisation and extension.
Towards a Solution
GATE is open source Java software under the GNU library licence, and is a stable, robust, and scalable infrastructure which allows users to build and customise language processing components, while mundane tasks like data storage, format analysis and data visualisation are handled by GATE. The system is bundled with components for language analysis, and is in use for Information Extraction (IE), Information Retrieval (IR), Natural Language Generation, summarisation, dialogue, Semantic Web, Knowledge Technologies and Digital Libraries applications. GATE-based systems have taken part in the all the major quantitative evaluation programmes for Natural Language Processing since 1995. It has been downloaded by thousands of sites worldwide with active users from universities and companies alike, e.g., UCL, UMIST, Karlsruhe, Vassar College, Perseus Digital Library, Tufts University, Pearson Education PLC, UK, Merck KgAa, Canon Europe, BBN Technologies, Knight Ridder.
GATE was specifically cited for its wide usage by a team of international reviewers that produced a document for the EPSRC, IEE and BCS http://www.iee.org/Policy/CSreport/ and was the only mention of Sheffield Computer Science in that document. GATE was used by DARPA in 1998 as the platform for its major end-of-project (TIPSTER) demonstration due to its superiority over the contracted US software.
GATE is currently used by a large number of research projects around the world for:
There are many other users of GATE not reported here; some more details may be found at http://gate.ac.uk/news.html.
The picture below shows GATE's development environment with its Information Extraction components loaded and their results demonstrated on the document shown.
Take a Guided Tour
Introduction to GATE (movie)
Try a Demonstration
H. Cunningham, D. Maynard, K. Bontcheva, V. Tablan. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL'02). Philadelphia, July 2002.
Other relevant documents: