|COCKATOO from University of Aberdeen|
What's the Problem?
One School of thought in Knowledge Acquisition says that you should not ask domain experts to provide rules/knowledge but simply tasks which they have worked. In a recent project, we carried out many task-orientated interviews with experts from the Oil industry and had them describe the rock structures which they had investigated. (Rock structures are composed of layers of rocks; layers of rocks are specified by the rock type, hardness and rock thickness.) It soon become clear, that there was a definite structure to the information they were providing, and hence it should be possible to develop a KA tool to capture this information from the domain expert directly. This regularity seems to be a characteristic of a whole range of domains - particularly if one focuses on collecting task-worked information from the domain experts.
Towards a Solution
COCKATOO captures the entities to be described in the data-set as elements of an extended BNF grammar; where the terminals can be expressed as a set of optional literals.
Additionally, constraints are used to ensure that variables fall within a pre-defined range; additionally, relationships between variables can be expressed as constraints. For example constraints can be used to ensure that adjacent rock strata have different names.
The COCKATOO approach makes a clear separation between the inference engine which acquires the data-sets and the KB which describes the knowledge to be acquired.
Grammar-driven Knowledge Elicitation
When eliciting knowledge, it is desirable to have a structured, declarative specification of the body of knowledge that needs to be acquired. This can be used as both the target of the knowledge acquisition process and the criterion by which the acquired knowledge is assessed. Formal grammars provide a means for specifying knowledge to be acquired, are structured and declarative, and are also widely understood by knowledge engineers and computer scientists. However, there is an important difference in the way that formal grammars are "traditionally" used, and the way that they have been applied here. Traditionally, grammars are used to solve the parsing problem; that is, to determine whether some given text conforms to some given formal grammar. For example, a C compiler must determine whether a given program consists entirely of legal C syntax. In grammar-driven knowledge elicitation, however, one attempts to acquire structured text such that it conforms to the given grammar.
We chose to represent EBNF grammars using a "LISPified" equivalent to the meta-notation of EBNF. This meta-language needs to:
We illustrate our ideas with a simplified example from the domain of petroleum geology, and, in particular, the acquisition of a case base of oil well drilling experiences. The knowledge captured in this way is used to support subsequent drill bit run modelling and optimisation; for example, to help choose the right drill bit for a given formation sequence, (A. Preece, et al, 2001). The EBNF grammar in figure 1 both describes and specifies a rock formation and its constituent lithologies (basic rock-types). The same grammar can be expressed in COCKATOO's syntax as shown in figure 2. (The correctness of the domain knowledge in our example has not been verified by a domain expert.)
Note that the non-terminal symbols lithology-depth and lithology-length have numeric values, and are more difficult to specify concisely with a grammar. We return to this issue in the following section. Note also that although our simple example illustrates only repetitions of 'one or more' (in this case, lithologies), COCKATOO also provides for repetitions of 'zero or more' with the keyword 'repeat*'.
COCKATOO grammars are interpreted top-down, left to right. Usually, a special parameter to the defgrammar macro (not described here for lack of space) informs COCKATOO which is the ‘top-most’ grammar clause. So, for example, in the grammar of figure 2, we would tell COCKATOO to start with the formation clause. The interpretation of this clause leads to the acquisition of a repetition of lithologies, each in turn consisting of a sequence of a rock, a lithology-depth, and an optional lithology-length. A rock, in turn, consists of a sequence of a rock-type and a rock-hardness. The acquisition of either of these two non-terminal symbols involves the capture of a decision from the user among a number of distinct options (e.g., shale, clay, chalk, granite or other). These options are presented on-screen to the user by COCKATOO, so that a choice can be made and recorded. COCKATOO is sensitive to the number of possible values available. If there are too many values to be listed (i.e., more than a configurable upper limit), then the upper and lower bounds of the symbol (internally, a constraint variable) are provided to the user as additional support. If these values are not available at acquisition time, then the user is dependent upon the guidance provided by the knowledge engineer in the form of comments and questions (White & Sleeman, 2001).
It is unrealistic to expect users to base their interaction with a knowledge acquisition tool on their understanding of an EBNF grammar. To help the user understand what information is required, and how it can be supplied, each clause of a grammar can be "decorated" with a question and/or a comment. A question should be a request for feedback which is directed at the user, such as "What is the rock-type?". A comment provides additional information, such as the meaning of particular terms, the exact format of the input, or other explanatory or "small-print" material. An example comment for the lithology clause might be "A lithology consists of a rock-type, a depth, an optional length, and a hardness".
Methodology for Developing KBs with COCKATOO
COCKATOO already provides mechanisms for specifying the required knowledge at a "high level". That is, during (grammar) development, the knowledge engineer can concentrate on the nature of the knowledge to be acquired, rather than the program that acquires it. Grammar development using COCKATOO is a (cyclic) refinement process, which includes the following chronological stages.
Try a Demonstration