[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Ozone+XML: what do users want?



For me the interest is primarily as a potentially large-scale 
persistent, updatable DOM.  Perhaps that is too vague, so I'll
explain a little.

My primary project just now is the computerization of a major
research project which is creating the first exhaustive 
dictionary of Sumerian (the world's first written language, as
it happens).  What is implied in this is more than a list of
entries, however.  I have to take account of the contexts in
which words occur (i.e., the texts, handled mostly in romanized
transliteration, but also with the option of jumping to photos
or hand-copies).  Because the texts come in, the organization of
the texts into corpora also comes in, hence catalogues of texts
and compositions enter the picture.  Then there is the history
of scholarship, both in the form of what we actively consult,
text editions, and the discussions of their content, i.e.,
lexicographical
analysis.  So bibliography comes in, too, and in my view, ultimately,
the works themselves that contain the discussions.  There are
other bits too, like the registries of personal names, field names,
temple names etc.  Also lists of signs and the values they can
have, which feed back into the semantics and other parts of the
dictionary.

So, the dictionary is a superdocument embracing an entire (if 
relatively small) knowledge-sphere.  My hope/expectation is that it 
will be feasible to implement this simply as a single document, 
using an editable PDOM, and be able to grab arbitrary pieces by 
idref (this word occurs in these 47 places [grab text citations]; the
citations are discussed in these places [grab bibliography]; the
word is written with these signs [grab signs]).

My need, then, is for a DOM that does not require excessive memory
(let's say, no more than 128M) and which responds quickly to random
requests most (but not all) of which occur via idrefs.

I could cache a lot of this, but I expect to set it up so that the
dictionary articles can dynamically reload updated lists of citations,
both because our understanding, and therefore reading of texts 
is constantly improving, and because there are always new texts being
discovered.  Thus, a dictionary article is really just a semantic
skeleton
with usages described as patterns (XQL patterns one day, I hope) that
can be run over the corpora when the article is referenced.  The 
articles will also contain a little predigested information such as
type-examples, which will be straightforward idrefs into the corpora.

This doesn't really answer your question very specifically, but it's
as close as I can come at the moment because I'm still putting the
pieces together.  

Ease-of-implementation is more important than speed,
but the speed has to be `acceptable', i.e., non-glacial.  At present,
I would characterize the Monster DOM print function, which takes about
a minute on test.xml on my PII 350 [admittedly puny these days] as
unacceptable. 

Falko's comments about stuff happening inside the server a message or
two back are still ringing in my head; do we need an XSLT that runs
inside the server?  We pass scripts in and it spits the result out?
Or am I on the wrong planet, once again?

 Steve

Ann Tecklenburg wrote:
> 
> > ozone as cache backend for cocoon? We tried this already with
> > less success.
> >
> 
> Yes, I doubt that an efficient implementation
> of  XML Document+XPath engine can come from in a loosely-coupled,
> external framework:
> I suspect that the XPath statement must be pre-processed and
> storage-coupled to its
> Document similar to the way that a high-performance RDBMS "binds" SQL
> queries against the
> target tables.
> 
> (In explanantion:  XPath statements are "selections" only, XSL is
> selection formatting.  XSLT=XSL+XPath.  If you are doing only XSLT for
> servlets,
> transactions are useless because the data stored in the server does not
> change.)
> 
> So, the really tough question is:  does efficiency matter anymore (esp.
> w.r.t. to pure Java)?
> 
> Or will flexibility, ease of use, and time-to-market win out?   What are
> the advantages of
> being able to run-time select the XML Parser, XPath and XSL processors
> engines?
> The disadvantages?
> 
> This is extremely important because
> it will be a world-class effort to do a highly optimized, tightly
> coupled Document+
> XPath+XSLT storage engine.
> 
> So we need to know:
> 
> 1)  How do you expect to use Ozone+XML?
> 
> 2)  What can you live without or get from another product?
> 
> 3)  What features do you *require* from Ozone+XML
> if you use the combination at all?
> 
> 4)  Will you primarily use Ozone+XML as a XML Document  storage
> engine for servlets?
> 
> 5)  Do you expect to use the XML or servlet features of
> Ozone, at all?
> 
> Comments greatly appreciated.
> 
> Best Wishes,
> 
> Ann Tecklenburg

-- 
----------------------------------------------------------------------
Steve Tinney                                        Babylonian Section
                                 *   University of Pennsylvania Museum
stinney@sas.upenn.edu                          Phila, PA. 215-898-4047