[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Handling large XML document



Now *I'm* confused. I was under the impression that Ozone was a persistent,
transactional, multiuser API and DOM around any arbitrary XML document or
collection of documents? Is that qualified to only be non-huge XML
documents?

Not that I'm complaining if so, I certainly don't have the answers about how
to maintain performance in such a database...

Falko, you asked a previous poster why they would use an exported SQL-type
dataset for an XML document... I think I can suggest that for many users new
to ozone, they'll be coming in with the following goal:

Their fully normalized relational database have gotten very complicated for
updates/deletes and the SQL queries are getting hideous. Once you go the
whole way with normalizing, using SQL becomes a real chore, and the DBMS
gets a real workout.

The idea of using XML and having more flexibility in elements, in effect
moving to a semi-strucured database, seems to have much promise. We (I)
naively hope that if *properly stuctured* XML schema are used for our XML
documents, we can get by with XPath queries where we used to use insane SQL
select statements. XQL and Xupdate will be very beneficial when XPath runs
into limitations (Which I don't personally know what they would be yet)

These datasets may be 'huge' more often than not, simply because we're
trading off normalized tables for verbose XML with a nice DOM API. At least
huge relative to the size that chokes current in-memory DOM parsers.
Consider the example of a contact database, with varying personal-data
elements. Such a document might eventually be 100K contacts. The second
'document' might be appointments for those contacts, and would become
similarly huge over time.

What I am hoping to find in Ozone is that it can eventually work on large
XML as a persistent, transactional, multiuser 'cursor' into an XML database
that stays on disk if its too big to cache. Perhaps some strategy for
caching only branches of the DOM tree presently in use is possible.
Priorities would be speed for Xpath queries, transactional atomicity for the
relatively infrequent inserts and updates, possibly with XUpdate. Indexes
might be appropriate to boost Xpath query speed.

BTW, If I'm way off base here, know that I'm really interested in Ozone and
what it can do. I'll be more informed with my impertinent questions and
opinions when I get bridged Ethernet working in VMWare so I can actually run
Ozone on a regular basis and explore what the heck it actually is.