AKTors.org
AKTors.org AKTTechnologiesPublicationsRelated ProjectsPeople
AKTors.org AcquisitionModellingRetrievalReusePublishingMaintenance
ClassAKT from The University of Southampton


ClassAKT fact-file

Owner  :  The University of Southampton
Demonstration  :  http://robin.ecs.soton.ac.uk:8000/classifier/
Builds on  :  Java, Weka
Addresses challenges  :  Knowledge Acquisition

What's the Problem?

A feature of ontologies is that they may have some sort of classification scheme built in to them. For example, an ontology which describes Computer Science university departments may as part of it's structure classify papers according to key topics (for example, Logic, Databases, Formal Methods, etc). Much of this classification may be created manually. However, this process is time consuming and expensive and there is a requirement for automatic methods of classifying documents against an ontology.

Machine Learning techniques have a long history of being a useful tool in the field of Text Classification. The objective of ClassAKT is to adapt this technology to provide generic classification tools and services for the Semantic Web.

Towards a Solution

ClassAKT is a set of utilities and services to solve two main tasks:

  • Classifier Creation. Using Dome and Perl, documents and their classification are obtained from the Web. Raw text is extracted and a classifier is created. There are a large number of methods to create a classifier. ClassAKT currently uses Naive Bayes (which requires feature extraction as a preliminary step).
  • Classifier Service. A web service has been created which accepts a url to a (pdf) document and returns it's classification in RDF format.



ClassAKT Architecture

The current implementation makes use of an ontology derived from the ACM Computing Classification System (ACM) of publications in Computing. The classifier is constructed in such a way as to classify documents according to the hierachy of the ACM (for example, if a document is initially classified as B. Hardware, the classifier will then attempt to extend the classification by looking at sub-classificastions of Hardware - B.0, B.1, B.2 and so on).

Try a Demonstration

A ClassAKT prototype is curently available within the Southampton firewall and may be accessed here.

Semantic representation

View in the AKT Triplestore Browser or as RDF.

Also available in DOAP RDF (Description Of A Project)