Table of Contents
The QALL-ME Framework is an architecture skeleton for multilingual question answering (QA) systems that answer questions with the help of structured answer data sources from freely specifiable domains. The language barrier is crossed with the help of a domain ontology. Provisions are made to easily anchor questions in space and time. Finding the mapping between question and answer – which is the major step in all QA systems – is done using a novel approach which is based on textual entailment recognizers. The framework is based on a Service Oriented Architecture (SOA) which is realized using web services.
The QALL-ME Framework is free software and comes with a set of demo components which illustrate the potential of the approach and which help new developers to get started with the framework. The framework seeks to be compliant with standards as far as possible in order to enhance interoperability and ease of use.
If you feel that certain points of the implementations in the framework do not suit your needs, then you can always go ahead and build a custom implementation which is still based on the QALL-ME Framework idea. We believe that the general ideas behind the framework are a crucial aspect of QALL-ME which can be realized in lots of different ways – the QALL-ME Framework is just one way! In particular it is a way of providing a basic realization in the form of an architectural skeleton as described above. The aim of this particular way is to get you started with your own QA system relatively fast.
The following sections provide a closer look at the ideas and main features of the QALL-ME Framework as listed above. If you are more interested in the technical details of how the framework works, then you might also directly head over to section 2.3: “System Architecture”; please note, however, that this section assumes a basic knowledge of the ideas behind the QALL-ME Framework which is only explicitly provided here.
The QALL-ME Framework provides a skeleton for multilingual QA systems. Such QA systems support questions and answers in several different natural languages like German, Italian or English. The key point here is now that answers can also be in another language than the language of the question, i.e., answers can be retrieved crosslingually. Imagine for example a German tourist in Italy asking a question in German about accommodations with certain facilities in the region. A QA system based on the QALL-ME Framework could now return suitable answers which are retrieved from Italian data even though the language of the inquiry is different from the available data.
The question (source) and answer (target) languages of a QA system which is based on the QALL-ME Framework don’t need to be the same. It is well possible that the system supports more source languages than target languages or the other way round. The sets of supported source and target languages even needn’t overlap at all if such a scenario should be required.
The set of supported target languages usually corresponds to the main languages of the locations for which answer data is available. So the target language for a certain question can usually be inferred from the spatial context of the question, see also section 2.1.4: “Spatiotemporal Anchoring of Questions”.
See the upcoming section “Use of a Domain Ontology” to get an idea of how this crosslingual functionality is realized.
QA systems which are built with the QALL-ME Framework query structured data sources for suitable answers. Such data sources are usually databases of some kind but may also be simple XML documents with a certain structure. In any case, the used data structures have to be accessible via predefined RDF interfaces; in the simplest case the data is already available in the correct RDF schema[1]. The concrete RDF schema which is used to represent answer data is specified by the domain ontology which is used. This implies that the answer data sources are always bound to a certain domain which, however, can be freely specified – as described in the next section.
In order to use the QALL-ME Framework, an ontology describing the domain on which the built QA system operates has to be provided: it contains concept descriptions for the target domain and descriptions of possible relations between these concepts. The ontology is then used in two ways: firstly, it is used to provide a schema to represent the structured answer data (cf. previous section). In other words, the answer data is described as instances of the ontology concepts using the vocabulary that the ontology provides. Secondly, the ontology is indirectly used to cross the language barrier in multilingual QA. This second way of using the ontology is actually just a nice side effect of the first way of ontology usage: by describing the answer data with the ontology vocabulary, we have a representation of the data which is independent of the original language of the data. We now only need to create a mapping from the question to a query which uses the ontology vocabulary, too; that query can then be applied to the answer data. The upcoming section “Recognizing Textual Entailment for QA” illustrates how this mapping is achieved in the QALL-ME Framework using textual entailment recognizers.
By default the QALL-ME Framework anchors questions in space and time. This means that all questions always have a spatial and temporal context. This should be natural: one can always use deictic expressions such as “here” or “tomorrow” in a question. Therefore, a question posed at eight o’clock in Berlin may potentially mean something completely different than the same question posed at five o’clock in Amsterdam; for example: “Where can I see a nice action movie nearby in about half an hour?”.
In multilingual settings (cf. section
2.1.1) the
spatial anchoring is usually used to find the target language. The temporal
anchor is also a crucial part in the QALL-ME Framework;
more information about the usage of spatial and temporal anchoring of
questions can be found in section
2.3: “System Architecture”
with the description of the QA planner and the
TimeAnnotator
components.
Traditional QA approaches for structured answer data often made a deep analysis of the question in order to get a logical form which was then translated to a query which is suitable for application on the answer data. For various reasons[2] this approach has been found to be inadequate or even infeasible. In the QALL-ME Framework we are addressing the mapping between natural language question and database query differently using RTE (Recognizing Textual Entailment) components. By using RTE components the problems of finding logical representations of questions and then mapping these to database queries is bypassed through semantic inference at the textual level.
An RTE component can recognize whether some text
T
entails a hypothesis H
, i.e.,
whether the meaning of H
can be fully derived from the
meaning of T
. In the context of the
QALL-ME Framework H
is always more or
less a minimal form of a question about some topic and T
is more or less the question which has to be answered. Through RTE we can
now find out which minimal questions are contained in a given question. The
set of minimal questions has to be defined in advance, i.e., each (minimal)
question that shall be answerable has to be in the set. Through RTE we can
then handle all reformulations of these minimal questions. One crucial part
is still missing on our way to the answer – the step from a minimal question
to the appropriate answer. In principal that’s easy: we simply attach the
correct answers to each minimal question and return that answer if the
minimal question is entailed by the user question.
As you may have guessed already, practically the above described process is a bit more complicated. Instead of using minimal natural language questions we use patterns of minimal natural language questions. To get these patterns we replace certain entities in the minimal questions with placeholders. Furthermore these patterns are not directly mapped to answers but rather to database query patterns containing the same placeholders as the question patterns. Here is how the question processing works then: we first replace all entities in the user question with placeholders. Next the RTE component tells us which of the (minimal) question patterns in our set is entailed by the question pattern we have just created from the input question. In the entailed pattern and the corresponding database query pattern we replace the respective placeholders with the entities that we had removed before from the original question. Now we have a complete database query which can be easily applied on the answer data.
As the query generation step using RTE is the central part of the QALL-ME Framework, we should take the time to go through an example of this step. Our imaginary sample QA system supports English input questions, i.e., before it can answer any questions, there has to be an RTE component for English. Furthermore, a set of minimal question pattern to query pattern mappings has to be created for all English question types that shall be answerable. Let’s assume we have the following mappings:
Question Pattern | Query Pattern |
---|---|
Who is the director of the movie
[MOVIE] ? |
SELECT ?directorName WHERE {
?movie qmo:name "[MOVIE]" .
?movie qmo:hasDirector ?person .
?person qmo:name ?directorName . } |
Who wrote the screenplay for the movie
[MOVIE] ? |
SELECT ?writerName WHERE {
?movie qmo:name "[MOVIE]" .
?movie qmo:hasWriter ?person .
?person qmo:name ?writerName . } |
Where can I see the movie
[MOVIE] ? |
SELECT ?cinemaName WHERE {
?movie qmo:name "[MOVIE]" .
?cinema qmo:showsMovie ?movie .
?cinema qmo:name ?cinemaName . } |
In which cinema in [CITY] can I
see the movie [MOVIE] ? |
SELECT ?cinemaName WHERE {
?movie qmo:name "[MOVIE]" .
?cinema qmo:showsMovie ?movie .
?cinema qmo:isInCity "[CITY]" .
?cinema qmo:name ?cinemaName . } |
Which movies can I see in
[CINEMA] ? |
SELECT ?movieName WHERE {
?cinema qmo:name "[CINEMA]" .
?cinema qmo:showsMovie ?movie .
?movie qmo:name ?movieName . } |
As the QALL-ME Framework assumes the answer
data to be represented in RDF, it is only natural to
use SPARQL as
the database query language; see also the
AnswerPool
component in section
2.3.2: “System Components”.
Note, however, that in the examples above the SPARQL
queries are not well-formed and highly simplified. Additionally the
examples are using vocabulary from an imaginary RDF
schema.
Observe the placeholders like [MOVIE]
or
[CITY]
in each part of the mappings: for each placeholder
in the question pattern of every mapping there is a corresponding
placeholder in the SPARQL query pattern.
With the above mappings and the English RTE component we are now ready to answer the first question. Imagine the user has asked the following:
“Which cinemas show the movie Dreamgirls tonight?”
T
to the RTE component:
“Which cinemas show the movie [MOVIE]
tonight?”
([MOVIE]
= “Dreamgirls”)
H
: this hypothesis is set to the
question pattern of the first of our pattern mappings, then to the second
and so on until we find a question pattern (H
) which is
entailed by the pattern we have created from our input question (i.e., the
text T
). In our example we can stop with the third
mapping of the above list: “Which cinemas show the movie
[MOVIE]
tonight?” (T
) textually
entails “Where can I see the movie [MOVIE]
?”
(H
). Therewith we now know the SPARQL
query pattern that will be used to find the answer to the original question:
“SELECT ?cinemaName WHERE {
?movie qmo:name "[MOVIE]" .
?cinema qmo:showsMovie ?movie .
?cinema qmo:name ?cinemaName . }
”
“SELECT ?cinemaName WHERE {
?movie qmo:name "Dreamgirls" .
?cinema qmo:showsMovie ?movie .
?cinema qmo:name ?cinemaName . }
”
The QALL-ME Framework is based on a Service Oriented Architecture (SOA). Such an architecture enables distributed computing and enforces loose-coupling of system components, the so-called services. Every service can be seen as a “black box” with a specialized functionality which is accessible only via standardized interfaces. An orchestration of these services creates a so-called business process. In the QALL-ME Framework the business process is always a QA system; the service components are for example entity annotators, query generators or textual entailment recognizers. Orchestrations of different service implementations create different QA systems and existing service implementations can easily be reused in different systems.
The SOA of the QALL-ME Framework is realized using web service (WS) technology. In particular the framework is built around WSs specified according to the WSDL 1.1 standard which has been developed by the W3C. WSDL-based WS implementations can be made in any programming language; their intercommunication is completely realized with XML messages according to the SOAP protocol. For each QA component in the framework there is a WSDL description. Implementations of these descriptions can be dynamically orchestrated with a special WS component that we call the QA planner. As the QA planner is a WS itself, it can be easily included in larger applications, for example into a website or a mobile application which makes use of a certain QA system.
See section 2.3: “System Architecture” for the actual architecture behind the QALL-ME Framework with all its WS components and their default orchestration.