AKTors.org
AKTors.org AKTTechnologiesPublicationsRelated ProjectsPeople
AKTors.org AcquisitionModellingRetrievalReusePublishingMaintenance
eServices from The University of Southampton


eServices fact-file

Owner  :  The University of Southampton
Description  :  http://opcit.eprints.org/eservices/
Addresses challenges  :  Knowledge Acquisition, Knowledge Publishing

What's the Problem?

The citations between papers form fundamental links that enable scholars to locate other papers that are somehow semantically related; perhaps providing background information, contrary theories, or supporting evidence. However, traditionally this can be a difficult and tedious task as problems in locating the related papers becomes evident.

With the large number of research publications available over the Web, this problem can be resolved to permit scholars to effortlessly browse through an enormous collection of interlinked knowledge. This presents researchers with a new paradigm for exploring their research field.

The OpCit project explores the technological issues of achieving this goal and implements a system to demonstrate the potential of the approach.

Towards a Solution

The e-Services framework was initially constructed for the OpCit project with the purpose of providing advanced services over literature data. The OpCit project explored how extensive citation linking could be added to a large collection of research papers in the physics discipline (as well as some citation-based services with citebase). This provides scholars with a unique method of efficiently browsing related literature. The e-Services framework extends this functionality by enabling researchers to further understand the relationships between literature, by, for example, requesting the significant papers or a visualisation of their relationships.

Like citebase, the e-Services framework uses the OAI interface to contact various e-print services and download the literature metadata. It then provides advanced services, such as simple visualisations (e.g. number of e-print deposits each year), co-citation visualisations (uses the co-citedness of papers as a proximity measure when plotting papers on a graph), and knowledge services (e.g. most significant papers).

The e-Services framework was populated with the entire collection of papers in the arXiv archive (currently over 200,000). This provided a large base for which to test the service and, importantly, create large and detailed co-citation maps. However, this size also caused several problems. Firstly, the large dataset caused computation to slow significantly. Secondly, due to erroneous or missing citations, some of the co-citation visualisations failed to portray a convincing or useful pattern.

The AKT project plans to use this technology to provide visualisation services over research papers in the UK Computer Science field, in particular, for the planned CS AKTive Portal.

The technology

The e-Services software runs as a collection of Perl scripts and a mySQL database. It is accessed through a Web interface (with the CGI protocol). The knowledge services are presented as text (and links), the simple visualisations as GIF images and SVG documents, and the co-citation visualisation as interactive SVG graphs. Figure 1 illustrates the general architecture of the e-Services framework.

Figure 1: e-Services architecture

Try a Demonstration

An on-line demonstration of the e-Services framework is not available. However, follow the link to enter the OpCit citebase system to explore citation linking. Use the provided search engine to locate papers, and then seemlessly follow the provided reference links.

Take a Guided Tour

An avi file (2.5MB) is available demonstrating the e-Services framework. Alternatively, various screenshots have been provided below.

Screenshots

The e-Services administrator decides which e-print archives the e-Services framework contacts and collects metadata from. When the metadata has been collected, the archive is listed and users can select it to explore it further. In Figure 2, the user selects one of the known archives to explore.

Figure 2: Select an archive to explore

Once an archive has been selected, the user can select general (overview) services about it (Figure 3). These services provide answers and graphs on the entire archive, rather than just a single instance.

Figure 3: Overview Facilities available

Figure 4 illustrates a graph of the publications (deposited) each year for the archive.

Figure 4: Publications per year

Figure 5 illustrates a graph of the highest publishing researchers in the archive.

Figure 5: Highest publishing researchers

The significant papers (based on citation impact) have been computed for the current archive and presented as a list of linked papers (Figure 6). By selecting a paper, further information on the paper can be retrieved (e.g. author, abstract, co-citations).

Figure 6: Significant papers

The green-coloured menu on the left of the screen provides links to all instances of a particular type. In Figure 7, all instances of literature in the archive are presented.

Figure 7: List of literature

When a particular literature instance is selected, further information on that paper is presented (Figure 8)

Figure 8: Services available for a paper

When a particular researcher instance is selected, further information on that researcher is presented (Figure 9)

Figure 9: Services available for a researcher

When the collaborators for the researcher listed in Figure 9 are requested, five peers are suggested (Figure 10).

Figure 10: Collaborators

Figure 11 illustrates a small view of a co-citation map for the current archive. Each node represents a paper. An arc between two nodes indicates that these two papers are highly co-cited. A node can be clicked on to enable the user to find out more about a particular paper (e.g. Figure 8).

Various options are available for presenting these graphs. The refinement level determines the number of iterations of the co-citation algorithm. A higher level results in a more "tree-like" (and therefore more effective) graph. The threshold level determines which co-cited papers are included. A level of 10, means that only those papers that have been co-cited at least 10 times are included in the computation. A lower level results in more nodes on the graph, at the expense of greater computational overhead. The graph can also be viewed inside the e-Services interface, or full-screen in its own browser window. Finally, papers that have been recently highly cited, can be marked in red, to help researchers spot active research areas.

Figure 11: Co-Citation (small view)

A full-screen co-citation map is presented in Figure 12.

Figure 12: Co-citation (large view)

A full-screen co-citation map of the entire arXiv collection is illustrated in Figure 13. Unfortunately, due to the imperfect citation data, as well as the incredible computation power required to produce highly refined graphs, the map is less than perfect.

Figure 13: Co-citation (large view) - all of arXiv

A more refined and lower threshold (full-screen) co-citation map of the entire arXiv collection (Figure 14).

Figure 14: Co-citation (large view) - all of arXiv

When a co-citation map is produced by selecting the co-citation icon from a literature instance page (e.g. Figure 8), then the resulting map provides a beacon to display where in the map the literature instance is located (Figure 15).

Figure 15: Co-citation (context beacon)

As part of the simple visualisation services, a citation network can be produced (Figure 16). This interactive SVG map presents cited and citing articles. The user can specify the depth to which these articles are presented.

Figure 16: Citation Network

A further citation network example is presented in Figure 17.

Figure 17: Citation Network

For further information please contact Simon Kampa or the OpCit or AKT projects.

Further Reading

Key document:

Other relevant documents:

Semantic representation

View in the AKT Triplestore Browser or as RDF.

Also available in DOAP RDF (Description Of A Project)