[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Handling large XML document



Falko,
	I'm still _very_ new to ozone so I'm going to 
ask a newbie question. Would it be possible for Adrian to split the
the document into 4.4 meg segments. Sort of how infinite arrays
 or tags in a TIFF are done?  

	I'm asking because this has a direct effect on my
evil plans for world domination err..... I mean my project,
GLIMS(http://glims.sourceforge.net).  Scientific data (especially spectra or
health-care 
case histories) can run into 100's of megs in some cases. And
into Terabytes on long-term astronomy studies or litigation
studies.  

	Ozone is the reference database for GLIMS, it's set up
so that other dB's can be plugged in with (relative) ease.[Write
a driver] So I need to be able to either 1) Have a workaround in place or
2) be able to tell users when ozone is not the best choice.

Thanks,
RobK

-----Original Message-----
From: Falko Braeutigam [mailto:falko@smb-tec.com]
Sent: Friday, December 01, 2000 8:00 AM
To: Adrian
Cc: ozone-users@ozone-db.org
Subject: Re: Handling large XML document


Adrian,

I did some more testing but wasn't able to really improve results. IBM
jdk1.3.2, which in some cases runs ozone somewhat faster than Sun jdk1.3,
was
not able to handle the 4.4MB document. So here are my best results for
storing.

[2 * PII 350, 256MB, Sun jdk1.3]

ozone server params:
ozone -ddb -udaniela -DozoneDB.wizardStore.tableBufferSize=150
-DozoneDB.wizardStore.clusterSizeRatio=100 -Xmx128000000

client params:
ojvm Client store 4.4M.xml

results:
store=127s; commit=58s ==============>  185s  to store the 4.4MB XML
document

xpath:
query="/Results/Result[@id=0]/Row/Field[@name='QTY_INV'][self::*='207']/../F
ield[@name='VENDOR_NAME']"
                                         12s  (warm database)

	query="/Results/Result[@id=0]/Row[@num='0']/Field[@name='QTY_INV']"
                                          1.2s  (warm database)

On Fri, 01 Dec 2000, you wrote:
> >%_Dear Falko,
> 
> Thank you very much.
> 
> Actually this XML document is for testing the performance of Ozone to
handle large
> XML document. 

Your document looks like an exported SQL database. Why do you try to use XML
to
handle such data? XML and especially XPath are not very suited to handle
such
data.

> Later we will have some other large XML documents (the XPath query is
> not yet known) to process. Anyway, does the XPath query affect the
required
> parameters of Ozone?

Yes. In case of the the second XPath query (see above) a real "path" is
selected out of the entire XML tree. This allows ozone to just activate the
clusters that are along this path. If your application always uses such
XPath
queries is does not need to be able to hold the entire document in memory.
The first XPath is something like a SQL selection. (again, XPath is not
suited
to do such queries) This results in many cluster activation events. So you
need
more memory to handle such queries efficiently. 


Falko
-- 
______________________________________________________________________
Falko Braeutigam                              mailto:falko@smb-tec.com
SMB GmbH                                        http://www.smb-tec.com