[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Handling large XML document



On Fri, 01 Dec 2000, Jeff Kowalczyk wrote:
> Now *I'm* confused. I was under the impression that Ozone was a persistent,
> transactional, multiuser API and DOM around any arbitrary XML document or
> collection of documents? Is that qualified to only be non-huge XML
> documents?

No.

But ozone, like all other software, needs to be well configured and run on
suited hardware to show good performance.

> 
> Not that I'm complaining if so, I certainly don't have the answers about how
> to maintain performance in such a database...
> 
> Falko, you asked a previous poster why they would use an exported SQL-type
> dataset for an XML document... I think I can suggest that for many users new
> to ozone, they'll be coming in with the following goal:
> 
> Their fully normalized relational database have gotten very complicated for
> updates/deletes and the SQL queries are getting hideous. Once you go the
> whole way with normalizing, using SQL becomes a real chore, and the DBMS
> gets a real workout.
> 
> The idea of using XML and having more flexibility in elements, in effect
> moving to a semi-strucured database, seems to have much promise. We (I)
> naively hope that if *properly stuctured* XML schema are used for our XML
> documents, we can get by with XPath queries where we used to use insane SQL
> select statements. XQL and Xupdate will be very beneficial when XPath runs
> into limitations (Which I don't personally know what they would be yet)
> 
> These datasets may be 'huge' more often than not, simply because we're
> trading off normalized tables for verbose XML with a nice DOM API. At least
> huge relative to the size that chokes current in-memory DOM parsers.
> Consider the example of a contact database, with varying personal-data
> elements. Such a document might eventually be 100K contacts. The second
> 'document' might be appointments for those contacts, and would become
> similarly huge over time.
> 
> What I am hoping to find in Ozone is that it can eventually work on large
> XML as a persistent, transactional, multiuser 'cursor' into an XML database
> that stays on disk if its too big to cache. Perhaps some strategy for
> caching only branches of the DOM tree presently in use is possible.
> Priorities would be speed for Xpath queries, transactional atomicity for the
> relatively infrequent inserts and updates, possibly with XUpdate. Indexes
> might be appropriate to boost Xpath query speed.

ozone exactly is that "persistent, transactional, multiuser 'cursor' into an
XML database...". But IMO XML is not the general way to store everything. But
this is not the problem and the time will tell. Today the problem is that, if
you are trying to use XML as a replacement for SQL database, you need a set
oriented query lang like SQL the performantly query the XML. Such query lang
simply does not exist.

XPath is fast as long as you are selecting "paths". When you start to really
search something, the XPath engine (that works on plain DOM) can only start a
brute force attack and iterate over the nodes because there is no indexing
information in the DOM.

So may or may not be XML a good replacement for RDBMS, but XPath (on plain
DOM) is surely not replacement for SQL.

Jeff, why aren't you use ozone as an OODBMS? DOM is a very generic object
model. Building your own application dependent object model gives you type
safeness, real OO design, boosts performance etc.

> 
> BTW, If I'm way off base here, know that I'm really interested in Ozone and
> what it can do. I'll be more informed with my impertinent questions and
> opinions when I get bridged Ethernet working in VMWare so I can actually run
> Ozone on a regular basis and explore what the heck it actually is.
:)


Falko
-- 
______________________________________________________________________
Falko Braeutigam                              mailto:falko@smb-tec.com
SMB GmbH                                        http://www.smb-tec.com