[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: XML and JNDI
Sorry I was talking about OpenXQL not Arkin's OpenXML :)
OpenXQL is open-source XQL engine, when OpenXML is XML parser.
GMD-IPSI Persistent DOM (PDOM) implementation stores XML documents in
PDOM files, which organized in pages, each containing 128 DOM nodes of variable
length.
http://xml.darmstadt.gmd.de/xql/xql-examples.html
"Using the persistent DOM (PDOM)
The PDOM class allows to generate binary, indexed files containing a persistent
W3C-DOM. A
PDOM file immediately offers all DOM operations without the cost of parsing XML
or building an
in-memory DOM representation first. Combined with servlets and XQL, PDOM files
offer an
efficient method to serve XML fragments from large documents. A PDOM file may be
created
from any XML file or programmatically using W3C-DOM methods.
When creating PDOM files from XML files, SAX events are used to communicate with
the XML
parser. Using the event based SAX API there never has to be a full
representation of your XML
file in main memory. Because of this the size of a PDOM file is only limited by
disk space, not by
main memory.
The de.gmd.ipsi.pdom.PDocument class implements org.w3c.dom.Document, so the
PDOM
may be used anywhere a W3C compliant DOM implementation is needed. As the PDOM
API
supports all methods of the W3C-DOM, including updates and inserts, programatic
creation
and modification of PDOM files is possible.
Overview of the PDOM Features
Caching: A PDOM file is organized in pages, each containing 128 DOM nodes of
variable
length. When a PDOM Node is accessed by a W3C-DOM method, the containing page is
loaded
into a main memory cache. Starting with a default cache size of 100 pages
(12.800 DOM
Nodes), the main memory cache can be resized any time. It will, however, never
shrink below
20 pages (2.560 DOM Nodes). It is recommended to use the largest cache size your
machine's
main memory can hold without swapping, as a larger cache improves overall PDOM
performance. The same cache is shared by all PDOM documents opened with the same
instance of the PDOM engine. The caching strategy used is "least recently used"
(LRU).
Defragmentation: When a node is programmatically inserted, updated or delete by
W3C-DOM methods, the page containing the node is invalidated ("dirty page"). If
a dirty page
is displaced from the cache, the modified page is appended at the end of the
PDOM file. So a
PDOM file will grow during write operations, as the file space occupied by
invalidated pages will
not be removed or reused automatically. Note that just reading and or querying a
PDOM file,
however, will never change the file size.
The PDOM file can be defragmented at any time by removing unused pages. During
this
operation a temporary file containing only valid pages is created and finally
the fragmented
PDOM file is replaced with the unfragmented copy. It is possible to define the
directory where
the temporary file is created. The slack ratio, that is the percentage of wasted
file space
divided by physical file size can be accessed by user applications. The number
is normalized to
a double between 0.0 and 1.0. It is up to the user application to start a
defragmentation,
probably if the slack ratio grows beyond a tolerable mark.
Full garbage collection: Defragmentation does work on a per-page basis and does
not free
space occupied by DOM nodes that have been deleted within pages. To also free
this space, a
full garbage collection is required. To avoid dangeling object references, a
garbage collection
is only safe if the PDOM file is not opened by another PDOM engine and no
PDocument object
is currently bound to the PDOM file. This also includes any child nodes of
PDocument, which
may still be in main memory left from previous operations. It is the duty of the
user application
to enforce this conditions, else you are in danger to garble the PDOM file. Full
garbage
collection includes defragmentation.
Commit points: At any time a user application doing update, delete or insert
operations on a
PDOM can decide to commit the current status quo of the PDOM. In the commit
operation the
main file index, normally maintained in main memory, is written back to disk. If
the user
application crashes, e.g. because of a "disk full" error, the PDOM will be in
the state is was
immediately before the last successful commit operation when re-opened. Great
care was
taken to ensure file consistency even after crashes. There is, however, a
minimal chance of
corrupting a file if the user application dies during a commit operation. Keep
in mind that the
PDOM does not try to be a fully fledged database.
Compression with gzip: Optionally a PDOM file can be compressed on the fly using
the gzip
algorithm. This will result in smaller files, usually half the size of an
uncompressed PDOM file.
The tradeoff here is speed: a compressed PDOM file usually increases the
execution for
reading and writing pages by 20%. Compression is a one time decision take at
creating time of
the PDOM file. A file can not be compressed later. All operations opening PDOM
files will
automatically recognize compression and handle this fact transparently. User
applications never
have to care or know about compression when dealing with existing PDOM files.
Multithreaded access: The same PDOM file can be read by multiple threads in
parallel
without problem. Update operations block read and write operations for other
threads. Given
this, all atomic operations on a PDOM file are thread safe. However, composed
update
operations (e.g. reading a node, modifying it and write back to the PDOM) suffer
from from
the well known transaction difficulties. To ensure atomicity of complex updates,
the application
has to synchronize the critical block of code with the PDocument object."
Falko Braeutigam wrote:
> On Sun, 08 Aug 1999, Zvi Avraham wrote:
> > >%_Falko,
> > look at OpenXML - Open Source XQL engine implementation:
> >
> > http://www.openxql.org
> >
> > Can it be incorporated into Ozone OODBMS ?
> > Of course we need first to implement Persistent DOM ...
> Unfortunately I'm very busy with other things at present but yes, we are going
> to use OpenXML to produce the persistent DOM. Currently I'm trying to figure
> out ways to avoid the 'node explosion'. So far I see 2 ways: First, cut the
> DOM tree at a certain depth and re-parse parts when retrieving. Second, do not
> store all nodes as persistent objects but make clusters instead. Any
> other/better ideas?
>
> Actually, I have not yet started the DOM implementation. I want to finish the
> single-VM version of ozone first.
>
> Falko
> --
> ______________________________________________________________________
> Falko Braeutigam mailto:falko@softwarebuero.de
> softwarebuero m&b (SMB) http://www.softwarebuero.de
--
-------------------------------------------------
Zvi Avraham, Senior Software Engineer
NetManage Inc., Visual Connectivity Division
http://www.netmanage.com/products/visual_conn.asp
begin:vcard
n:Avraham;Zvi
tel;cell:+972-52-837908
tel;fax:+972-3-5788752
tel;home:+972-4-8551158
tel;work:+972-3-5788753
x-mozilla-html:FALSE
url:http://www2.netmanage.co.il/~zvia
org:NetManage, Inc.;Visual Connectivity Division<BR><A href="/ozone-users/1999/http://www.netmanage.com"><IMG src="http://www.netmanage.com/images/newhead-l.gif" WIDTH="129" HEIGHT="63" ALT="NetManage" BORDER="0"></A><A href='http://www.netmanage.com/products/visual_conn.asp'><IMG src='http://www.netmanage.com/images/vc_middle.gif' WIDTH=227 HEIGHT=63 ALT='NetManage Visual Connectivity Group' border=0></A>
adr:;;;;;;
version:2.1
email;internet:zvia@netmanage.co.il
title:Senior Software Engineer
note:I beleive I can fly !
fn:Zvi Avraham
end:vcard