[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: OOQuery Language



Hi Falko, Mariusz, others,

Thanks for sharing your thoughts on this.

I'm quite new to this area so perhaps I'm not seeing the situation clearly. 
However as I understand the architecture that Mariusz mentions in his post 
the communication between the Query Processors and Uniform Query Kernal 
would be a form of a query. This would have to be in a language that could 
rigorously describe any query in OQL, XPath, etc., i.e. a generic query 
algebra.

I'm unclear how a Java based model would be used inside the Uniform Query 
Kernal. It seems the Uniform Query Kernal would be based upon a query 
algebra which would in turn be based upon an object algebra that could model 
any data at the "native" level such as SQL store, Shapedata, XML data. This 
way the kernal could be a generic back-end for all front-ends as Falko 
mentioned.

In this scenario the Query Processors could parse the user queries and map 
them to the generic query algebra while rejecting any queries that couldn't 
be mapped to the generic query language, and finaly pass a query algebra 
expression to the Uniform Query Kernel. The Uniform Query Kernel could 
validate the query and perhaps transform & simplify it before sending data 
retrieval requests or sub-queries to the native Drivers. The Drivers would 
the do the data retrieval and pass the data back to the Uniform Query Kernal 
which would tranform the data as needed for the result structure.

Mariusz, is this close to what you had in mind for your architecture?

If this is a reasonable interpretation of the architecture one way to assess 
the scope of the effort would be to look at candidate query algebras & 
object algebras. Are there algebras available? If not I suspect creating 
them would be a considerable project in itself.

Assuming there is a suitable query algebra perhaps an incremental 
implementation of the Query Kernal might be done in a way that would meet 
the needs of all involved.

cheers,

don

Mariusz said:
>
>Hi Falko, Don,  others ;o)
>
>[...]
> > Wouldn't it be easier to to all this in just one environment: XML? I 
>mean,
> > why not transfering the data content into XML and using XPath (or 
>another XML
> > query lang when available) to define the content of the report?
>
>In fact this is exactly what we have considered some time ago. However,
>usually you have OO data modelled as UML diagrams, and the transition
>between pure OO (UMLish) view and XPath is somehow awkward. It is more
>natural (for the higher level) to represent data models as UML and ask
>queries in something like OQL, which can then, transparently to the user,
>be converted to XPath expressions, or any other 'native' format query
>in which the data is physically stored.  If your top level model is
>expressed in XML Schema, then XPath is fine as a query language. However,
>to us, it seems to be nicer to have more abstract model of OO data, which
>internally can be stored in OO database, SQL-based database, Shapefiles,
>XML-based storage, etc. For this "more abstract" model we use UML, and XMI
>representation (serialization) of the UML for computations/resoning
>(sort of data schema).
>
>[...]
> > > Although I can understand the desire for a more OO query, my brief 
>survey of
> > > the genesis of OQL and other attempts at OO query languages convinced 
>me the
> > > definition of a robust new OO query language is not a trivial task. 
>Although
> > > S.O.D.A. and other alternatives may look interesting, AFAIK they don't 
>have
> > > a complete, rigorous definition to base an implementation upon.
> > > This is why
> > > I chose not to participate in  the earlier request related to 
>implementing
> > > S.O.D.A. OQL may not be elegant, truly OO, or computationally complete 
>but
> > > it's servicable.
> > Well, SODA is a totally new approach and it is work in progress. I 
>agree.
> >
> > Anyway, for me the most important questions is not what is the best OO 
>query
> > lang (if such thing exists, anyway ;) but: how to evolve ozone so that 
>it meets
> > the users needs. And it seems that people want descriptive query langs 
>today ;)
> >
> > All possible query lang solutions are based on the same principles, so I 
>assume
> > that a generic kernel can be the back-end for all front-ends. Don, I 
>understand
> > that the development of such a kernel is out of your scope. But what we 
>need
> > now is not a ready-to-go software but a list of requirements that such a
> > kernel has to fulfill. Such a specification would be of real benefit for 
>the
> > ozone development and for your development because it would clearly 
>separate
> > ozone specific from independent code. If I got this right, Mariusz has
> > proposed such an architecture in his last mail. IMO his "driver" API is
> > exactly the API of "my" the query-kernel. What do you think?
>
>
>Yep, I finally got it ;o)  I have been following the discussion for some
>time now not fully understanding it, but it seems we were talking about
>the same things using different language. I fully agree with Falko here,
>and indeed his "query-kernel" is exactly what I was refering to as
>"modular interpreter" with the "multiple drivers" API.
>
>`I can describe what we have got so far. It is really rudimentary
>model, and we have simplified a lot of things, because we do not
>need currently full OO power.
>In our system we decided to model data as typed objects stored in typed
>collections, and all collection managers have to register the type and
>location of the respective collection with the resources broker. At that
>stage we know which collections can be served by which collection managers
>objects.  So if you call collection manager a driver, which speaks given
>API, and if you keep track (registry) of all drivers, and if you allow
>single driver to serve more than single collection, you have a
>picture.
>
>      -----
>     | OQL |
>      -----
>        |                                         "top-level model"
>        V
>  -----------------       -----------------
>| Query Processor |<--->| Resource Broker |
>  -----------------       -----------------
>   |             | \
>   |             |   ---------                   "uniform OO model"
>   V             V             \
>  ----------    ----------    ----------
>| Driver a |  | Driver b |  | Driver c |
>  ----------    ----------    ----------
>
>  SQL store     Shapedata      XML data            "native" level
>
>
>We do not support methods matching yet, and we do not support joins on the
>driver level yet. On the driver level we have just two methods:
>1. Collection getData(Object o);
>Returns a collection of Objects, such as for all non-null fields of the
>object o returned objects have the same values. This is equivalent of
>select o
>   from o in Collection
>   where o.attr1 = value1 and o.attr2 = value2 and etc
>
>2.     Object getUniqueData(Object o);
>The same as above, but forces single result object.
>
>Thus, "or" operator on the where clause is done by the query processing
>module by multiple calls to the "drivers", and joins are done by
>translating results from one subquery into a where clause of another
>subquery.  Path expressions are supported on the fields level by nested
>objects inside attributes, so if a field type is not String/Number,
>recursive matching is used, but we do not support nested iterators (nested
>queries in general are split by the query processor).
>
>On the top level we support CORBA-based object traversal (by value),
>and we think of supporting XML and possibly Java Remote objects (or by
>value). On the backend as I said, we have SQL based sources and Arc/Info
>sources, and in fact with current data and queries we mostly use
>getUniqueData call.  On the uniform OO model we have used Java objects and
>reflection, but it was terribly slow, so we decided to not support
>methods (like 'where o.method() = value') and we moved to use Hashtables
>instead, and we speeded up by a factor of 1000 most of the queries.
>Joins are not efficient, and more thinking is needed for it.  What we
>think, is actually going a little away of the OQL with its iterators
>model, to a streaming mechanism, something like:
>
>header (single discrete unit)
>select o where o.attr1 = $1 or o.attr2 = $2
>
>body (stream)
>value1_1 value2_1
>value1_2 value2_2
>  ...       ...
>
>
>So we can make joins much more efficiently, and we could do partial
>evaluation as one of the sources progresses. There is a person who works
>on algebraic notation for it and some comprehension calculus, I am not
>that much into it ;o) but  we will report on that as soon as it is more
>clear.
>
>I am not sure as for the API, beacuse in our model we have a seperate API
>on the Query Processor level, (which supports in principle all OO
>things) and a simplified version for "drivers" to translate a subset of
>OO query into native queries. The drivers are also responsible to
>preformat result into some sort of OO models (we use seperate
>schemas/mappings for SQL and other sources how to map rows into objects,
>etc).
>As for Ozone maybe we should have something more like:
>
>
>      -----                  -------
>     | OQL |                | XPath |
>      -----                  -------
>        |                       |                "top-level model"
>        V                       V
>  -----------------       -----------------
>| Query Processor |     | Other Processor |
>  -----------------       -----------------
>        |                 /
>        |                /                        "uniform OO model"
>   ----------------------
>  | Uniform Query Kernel |
>   ----------------------
>     |           | \
>     |           |   ---------                   "internal OO model"
>     V           V             \
>  ----------    ----------    ----------
>| Driver a |  | Driver b |  | Driver c |
>  ----------    ----------    ----------
>
>  SQL store     Shapedata      XML data            "native" level
>
>
>What do you think?  Should Ozone OO model be used as a "uniform OO
>model" or maybe it should be treated as a particular "native" level only,
>and something else should be the uniform model (another common API on the
>processor level).  To me, pure Java based model should be used inside the
>Query Kernel, however, I am not sure how to solve problem with objects
>identity in such model.  Also, in our case we need to be able to make
>joins across heterogenous sources, I am not sure it is generally a case,
>and maybe if there is only single driver at a time working with the query
>kernel the problem is much simpler.
>
>Suggestions?
>
>(sorry for the length of my messages)
>
>cheers
>Mariusz
>

_________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.

Share information about yourself, create your own public profile at 
http://profiles.msn.com.