up previous
Up:
CAPTURING REPRESENTING AND OPERATIONALISING Previous: Modular Architecture



CROSI (Ontology) Mapping System (CMS)

CMS (CROSI Mapping System) is a structure matching system that capitalizes on the rich semantics of the OWL constructs found in source ontologies and on its modular architecture that allows the system to consult external linguistic resources. It operationalises the modular architecture we developed in this project and employs a multi-strategy system comprising of four modules, namely: .

  Ontology features used for mapping

In CMS, different features of the input data are generated and selected to fire off different sorts of feature matchers. Hence, the first step when deploying CMS was to extract characteristics that can be used to identify similar entities from different ontologies. We summarize the characteristics we extracted in table 4.1.
Features extracted for ontology mapping.
Table 4.1: Features extracted for ontology mapping.
There are several points that need further explanation. First, in many cases, identifying corresponding instances is considered to be an easier task than identifying corresponding classes. This is because instances are expected to have more grounded variables. Corresponding instances provide a ground on which the number of candidate mapping classes can be narrowed down to a few (as we discovered in our past work with the IF-Map instance-based system [1]). Second, in case of complement classes, let cs be a class from the source ontology and ct from the target ontology, if sim(cs, ct) = a and d = Øc, we can safely conclude that sim(d, cs) = 1-a, where sim/2 is the similarity function and a, a real number, gives the confident value.


  CMS GUI

The resultant similarity values are then compiled by multiple similarity aggregators running in parallel or consecutive order. The overall similarity is then evaluated to initiate iterations that backtrack to different stages. We include a screenshot of the Web-based interface of CMS in figure 1.
The Web-based Interface of CMS.
Figure 1: The Web-based Interface of CMS.


  CMS specific techniques

To fit the requirements of different applications, CMS implements a series of mapping techniques, which are regarded as independent components that made up CMS.

  Name matchers

Ranging from pure syntactical approaches to more semantically enriched ones, name matchers are categorised as: String (tokenised) distance, Thesaurus, and WordNet hierarchical distance. Levenstein distance is the simplest implementation of string distance. More sophisticated ones are: Monge-Elkan distance which optimizes edit-distance functions with well-tuned editing cost and the Jaro metric and its variants which computes an accumulated similarity of s and t from the order and number of common characters between s and t. In CMS a thesaurus comes into play in two forms: WordNet and a predefined corpora that are implemented as WNNameMatcher and CorpusNameMatcher, respectively. To facilitate the use of WordNet, we assume that local names of classes are either nouns or noun phrases while local names of properties are phrases starting with verbs followed by either nouns or adjectives. Elements in the retrieved synsets are then compared against each other using either exact string matching or one of the string-distance based algorithms discussed in the previous section. WordNet arranges its entries in hierarchical structures. Hence, the similarity between names can be computed as follows: let wi and wj be the corresponding WordNet entries of namei and namej, w be the least common hypernym of wi and wj, r be the root of the underlying WordNet hierarchy, and hi, hj, h be the distances between wi and r, wj and r, w and r, respectively, the similarity between wi and wj is approximated as 2×h / hi + hj.

  Semantic matchers

In CMS, a semantic flavour is added in two different ways: structure-aware and intension-aware matchers. Structure-awareness refers to the capability of traversing class hierarchies and accumulate similarities along the sub-class (sub-property) relationships. Let c and d be two classes from source and target ontologies, ci and di are their direct parents in respective ontologies, the similarity between c and d is recursively defined as sim(c, d) = asimlocal(c, d) +bsim(ci, di), where a and b are arbitrary weights and simlocal/2 gives the local similarity with regard to c and d which can be computed using one or a combination of techniques discussed above.
Intension-awareness takes into account the definitions of classes. A class c is regarded as a tuple áS, P ñ where S is a set of classes of which c is a subclass and P is a set of properties having c as domain and other classes or concrete data types as range. Hence, finding the semantic similarity between c = áSc, Pcñ and d = áSd, Pdñ amounts to finding the similarity between Sc and Sd as well as Pc and Pd, i.e. sim(c, d) = asim(Sc, Sd)+ bsimproperty(Pc, Pd), where a and b are arbitrary weights and simproperty/2 computes the property similarity. More specifically, we differentiate the following situations:
The first situation contributes the most to the similarity of c and d. We regard classes with matching names and exact matching properties, i.e., properties with same name, domain and range, as semantically equivalent classes.
In many cases, matching between DPc and DPd (FPc and FPc, respectively) can only be concluded after traversing several levels upwards or downwards of the class hierarchy. Although not as strong as exact matching of property domains and ranges, matching classes of DPc (FPc) to remote ancestors or descendants of classes of DPd (FPd) provides a hint on how close the different properties are, and thus how similar the two concepts c and d are. Such an idea is implemented in CMS as a ClassDefPlusMatcher method.

External matchers

The most distinctive feature of CMS is its capability of combining ontology/database schemata matching systems. Existing matching systems are wrapped to provide a unique interface with other modules of CMS. In the current implementation, FOAM alignment framework (FOAM hereinafter) and INRIA alignment API (INRIA, hereinafter) are invoked as external sources that matching candidates are drown upon. The reason of using FOAM and INRIA is twofold: 1) both of the systems are programmed in Java making the integration with CMS straightforward; 2) as illustrated in table
4.2, although based on similar algorithms, FOAM and INRIA produce results that are disparate enough to make aggregation meaningful. The integration of other ontology/database schemata matching systems is forthcoming.

Variant results of different mapping systems
Table 4.2: Variant results of different mapping systems.




Bibliography

[1]
Y.Kalfoglou, H.Alani, M.Schorlemmer, and C.Walton. On the Emergent Semantic Web and overlooked issues. In Proceedings of the 3rd International Semantic Web Confernece (ISWC'04), LNCS 3298, Hiroshima, Japan, pages 576591, Nov. 2004.



This material was prepared under the CROSI project. Copyright remains with the authors. Parts or the whole of this text have been published in conferences, workshops and other knowledge disseminating events.
CROSI presents this information online merely for sake of information dissemination.
This material should not be copy-pasted without acknowledging its origins.
Please contact the authors for information on how to use or reference this material.