[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Handling large XML document



On Fri, 01 Dec 2000, Knapp, Robert \(CAP, CMC\) wrote:
> >----Original Message-----
> >From: Falko Braeutigam [mailto:falko@smb-tec.com]
> >Sent: Friday, December 01, 2000 8:59 AM
> >To: Knapp, Robert (CAP, CMC); 'ozone-users@ozone-db.org'
> >Subject: RE: Handling large XML document
> 
> 
> >On Fri, 01 Dec 2000, Knapp, Robert \(CAP, CMC\) wrote:
> >> Falko,
> >> 	I'm still _very_ new to ozone so I'm going to 
> >> ask a newbie question. Would it be possible for Adrian to split the
> >> the document into 4.4 meg segments. Sort of how infinite arrays
> >>  or tags in a TIFF are done?  
> >> 
> >> 	I'm asking because this has a direct effect on my
> >> evil plans for world domination err..... I mean my project,
> >> GLIMS(http://glims.sourceforge.net).  Scientific data (especially spectra
> or
> >> health-care 
> >> case histories) can run into 100's of megs in some cases. And
> >> into Terabytes on long-term astronomy studies or litigation
> >> studies.  
> 
> >Wait! What are we talking about? 100's of megs in _one_ XML document?
> 
> Well, I'm still in the design phase so it is hard to answer this
> 100%.  The class of interest (GlimsItem) can contain 100's of
> megs or more.  Depending on the implementation, there
> could be references to other XML files for some of the properties.
> In that case the finally constructed object may contain 100's of megs
> of data in one object. [This obviously is a very extreme case,
> typical would be 1M or so.] Even in this case, we may be looking
> at instances where a single GlimsResult may be very large. 
> 
> Again, it's an implementation problem. The reason that I asked
> the question is that it will need to be something I take into
> account with my design.  
> 
> >> 
> >> 	Ozone is the reference database for GLIMS, 
> >Why then isn't it in the list of "ozone powered projects"??? ;)
> 
> Ummm, frankly I don't know. I wrote to you about it about it
> privately on Oct 30th, the subject was "ozone usage."  I wasn't
> aware that there was sush a list.
> 
> >> it's set up
> >> so that other dB's can be plugged in with (relative) ease.[Write
> >> a driver] So I need to be able to either 1) Have a workaround in place or
> >> 2) be able to tell users when ozone is not the best choice.
> 
> >It depends on the answer of the questions above. Or in general, what are
> you
> >actual going to store in ozone?
> 
> I think that the best answer to that is "Scarily large amounts of data."

Large amount of data, of course. ;) The questions is: are the objects as tiny
as XML nodes?

The 4.4MB XML sample mentioned today in another thread contains 270,000 nodes.
100MB of such XML data would then contain 610M nodes....

> Due to the nature of LIMS, it could be anything for a the bytecode
> for a 15-minute mpeg4, to a TIFF, and so on and so forth.  While
> 55% or so of the potential users my never exceed a meg per GlimsItem,
> the remaining 45% or so may be storing large amounts of data per GlimsItem.

Ok, but a BLOB to store a 100MB mpeg produces only a few database objects
compared to XML.


Falko
-- 
______________________________________________________________________
Falko Braeutigam                              mailto:falko@smb-tec.com
SMB GmbH                                        http://www.smb-tec.com