[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: OOQuery Language



Hi Falko, Don,  others ;o)

[...] 
> Wouldn't it be easier to to all this in just one environment: XML? I mean,
> why not transfering the data content into XML and using XPath (or another XML
> query lang when available) to define the content of the report?

In fact this is exactly what we have considered some time ago. However,
usually you have OO data modelled as UML diagrams, and the transition
between pure OO (UMLish) view and XPath is somehow awkward. It is more
natural (for the higher level) to represent data models as UML and ask
queries in something like OQL, which can then, transparently to the user,
be converted to XPath expressions, or any other 'native' format query 
in which the data is physically stored.  If your top level model is
expressed in XML Schema, then XPath is fine as a query language. However,
to us, it seems to be nicer to have more abstract model of OO data, which
internally can be stored in OO database, SQL-based database, Shapefiles,
XML-based storage, etc. For this "more abstract" model we use UML, and XMI
representation (serialization) of the UML for computations/resoning 
(sort of data schema).

[...]
> > Although I can understand the desire for a more OO query, my brief survey of 
> > the genesis of OQL and other attempts at OO query languages convinced me the 
> > definition of a robust new OO query language is not a trivial task. Although 
> > S.O.D.A. and other alternatives may look interesting, AFAIK they don't have 
> > a complete, rigorous definition to base an implementation upon. 
> > This is why 
> > I chose not to participate in  the earlier request related to implementing 
> > S.O.D.A. OQL may not be elegant, truly OO, or computationally complete but 
> > it's servicable.
> Well, SODA is a totally new approach and it is work in progress. I agree.
> 
> Anyway, for me the most important questions is not what is the best OO query
> lang (if such thing exists, anyway ;) but: how to evolve ozone so that it meets
> the users needs. And it seems that people want descriptive query langs today ;)
> 
> All possible query lang solutions are based on the same principles, so I assume
> that a generic kernel can be the back-end for all front-ends. Don, I understand
> that the development of such a kernel is out of your scope. But what we need
> now is not a ready-to-go software but a list of requirements that such a
> kernel has to fulfill. Such a specification would be of real benefit for the
> ozone development and for your development because it would clearly separate
> ozone specific from independent code. If I got this right, Mariusz has
> proposed such an architecture in his last mail. IMO his "driver" API is
> exactly the API of "my" the query-kernel. What do you think?


Yep, I finally got it ;o)  I have been following the discussion for some
time now not fully understanding it, but it seems we were talking about
the same things using different language. I fully agree with Falko here,
and indeed his "query-kernel" is exactly what I was refering to as
"modular interpreter" with the "multiple drivers" API. 

`I can describe what we have got so far. It is really rudimentary
model, and we have simplified a lot of things, because we do not
need currently full OO power.
In our system we decided to model data as typed objects stored in typed
collections, and all collection managers have to register the type and
location of the respective collection with the resources broker. At that
stage we know which collections can be served by which collection managers
objects.  So if you call collection manager a driver, which speaks given
API, and if you keep track (registry) of all drivers, and if you allow
single driver to serve more than single collection, you have a
picture.

     -----
    | OQL |
     -----
       |                                         "top-level model"
       V
 -----------------       -----------------
| Query Processor |<--->| Resource Broker |
 -----------------       -----------------
  |             | \
  |             |   ---------                   "uniform OO model"
  V             V             \
 ----------    ----------    ----------
| Driver a |  | Driver b |  | Driver c |
 ----------    ----------    ----------

 SQL store     Shapedata      XML data            "native" level


We do not support methods matching yet, and we do not support joins on the
driver level yet. On the driver level we have just two methods: 
1. Collection getData(Object o);
Returns a collection of Objects, such as for all non-null fields of the
object o returned objects have the same values. This is equivalent of 
select o 
  from o in Collection 
  where o.attr1 = value1 and o.attr2 = value2 and etc

2.     Object getUniqueData(Object o);
The same as above, but forces single result object.

Thus, "or" operator on the where clause is done by the query processing
module by multiple calls to the "drivers", and joins are done by
translating results from one subquery into a where clause of another
subquery.  Path expressions are supported on the fields level by nested
objects inside attributes, so if a field type is not String/Number,
recursive matching is used, but we do not support nested iterators (nested
queries in general are split by the query processor).  

On the top level we support CORBA-based object traversal (by value),
and we think of supporting XML and possibly Java Remote objects (or by
value). On the backend as I said, we have SQL based sources and Arc/Info
sources, and in fact with current data and queries we mostly use
getUniqueData call.  On the uniform OO model we have used Java objects and
reflection, but it was terribly slow, so we decided to not support
methods (like 'where o.method() = value') and we moved to use Hashtables
instead, and we speeded up by a factor of 1000 most of the queries. 
Joins are not efficient, and more thinking is needed for it.  What we
think, is actually going a little away of the OQL with its iterators
model, to a streaming mechanism, something like:

header (single discrete unit)
select o where o.attr1 = $1 or o.attr2 = $2

body (stream)
value1_1 value2_1
value1_2 value2_2
 ...       ...


So we can make joins much more efficiently, and we could do partial
evaluation as one of the sources progresses. There is a person who works
on algebraic notation for it and some comprehension calculus, I am not
that much into it ;o) but  we will report on that as soon as it is more
clear.

I am not sure as for the API, beacuse in our model we have a seperate API
on the Query Processor level, (which supports in principle all OO
things) and a simplified version for "drivers" to translate a subset of
OO query into native queries. The drivers are also responsible to
preformat result into some sort of OO models (we use seperate
schemas/mappings for SQL and other sources how to map rows into objects,
etc).  
As for Ozone maybe we should have something more like:


     -----                  -------
    | OQL |                | XPath |
     -----                  -------
       |                       |                "top-level model"
       V                       V
 -----------------       -----------------
| Query Processor |     | Other Processor |
 -----------------       -----------------
       |                 /
       |                /                        "uniform OO model"
  ----------------------
 | Uniform Query Kernel |
  ----------------------
    |           | \
    |           |   ---------                   "internal OO model"
    V           V             \
 ----------    ----------    ----------
| Driver a |  | Driver b |  | Driver c |
 ----------    ----------    ----------

 SQL store     Shapedata      XML data            "native" level


What do you think?  Should Ozone OO model be used as a "uniform OO
model" or maybe it should be treated as a particular "native" level only,
and something else should be the uniform model (another common API on the
processor level).  To me, pure Java based model should be used inside the
Query Kernel, however, I am not sure how to solve problem with objects
identity in such model.  Also, in our case we need to be able to make
joins across heterogenous sources, I am not sure it is generally a case,
and maybe if there is only single driver at a time working with the query
kernel the problem is much simpler.  

Suggestions?

(sorry for the length of my messages)

cheers
Mariusz