[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: XML and JNDI



Sorry I was talking about OpenXQL not Arkin's OpenXML :)
OpenXQL is open-source XQL engine, when OpenXML is XML parser.

GMD-IPSI  Persistent DOM (PDOM) implementation stores XML documents in
PDOM files, which organized in pages, each containing 128 DOM nodes of variable
length.

http://xml.darmstadt.gmd.de/xql/xql-examples.html

"Using the persistent DOM (PDOM)

  The PDOM class allows to generate binary, indexed files containing a persistent
W3C-DOM. A
  PDOM file immediately offers all DOM operations without the cost of parsing XML
or building an
  in-memory DOM representation first. Combined with servlets and XQL, PDOM files
offer an
  efficient method to serve XML fragments from large documents. A PDOM file may be
created
  from any XML file or programmatically using W3C-DOM methods.

  When creating PDOM files from XML files, SAX events are used to communicate with
the XML
  parser. Using the event based SAX API there never has to be a full
representation of your XML
  file in main memory. Because of this the size of a PDOM file is only limited by
disk space, not by
  main memory.

  The de.gmd.ipsi.pdom.PDocument class implements org.w3c.dom.Document, so the
PDOM
  may be used anywhere a W3C compliant DOM implementation is needed. As the PDOM
API
  supports all methods of the W3C-DOM, including updates and inserts, programatic
creation
  and modification of PDOM files is possible.

  Overview of the PDOM Features

  Caching: A PDOM file is organized in pages, each containing 128 DOM nodes of
variable
  length. When a PDOM Node is accessed by a W3C-DOM method, the containing page is
loaded
  into a main memory cache. Starting with a default cache size of 100 pages
(12.800 DOM
  Nodes), the main memory cache can be resized any time. It will, however, never
shrink below
  20 pages (2.560 DOM Nodes). It is recommended to use the largest cache size your
machine's
  main memory can hold without swapping, as a larger cache improves overall PDOM
  performance. The same cache is shared by all PDOM documents opened with the same

  instance of the PDOM engine. The caching strategy used is "least recently used"
(LRU).

  Defragmentation: When a node is programmatically inserted, updated or delete by
  W3C-DOM methods, the page containing the node is invalidated ("dirty page"). If
a dirty page
  is displaced from the cache, the modified page is appended at the end of the
PDOM file. So a
  PDOM file will grow during write operations, as the file space occupied by
invalidated pages will
  not be removed or reused automatically. Note that just reading and or querying a
PDOM file,
  however, will never change the file size.

  The PDOM file can be defragmented at any time by removing unused pages. During
this
  operation a temporary file containing only valid pages is created and finally
the fragmented
  PDOM file is replaced with the unfragmented copy. It is possible to define the
directory where
  the temporary file is created. The slack ratio, that is the percentage of wasted
file space
  divided by physical file size can be accessed by user applications. The number
is normalized to
  a double between 0.0 and 1.0. It is up to the user application to start a
defragmentation,
  probably if the slack ratio grows beyond a tolerable mark.

  Full garbage collection: Defragmentation does work on a per-page basis and does
not free
  space occupied by DOM nodes that have been deleted within pages. To also free
this space, a
  full garbage collection is required. To avoid dangeling object references, a
garbage collection
  is only safe if the PDOM file is not opened by another PDOM engine and no
PDocument object
  is currently bound to the PDOM file. This also includes any child nodes of
PDocument, which
  may still be in main memory left from previous operations. It is the duty of the
user application
  to enforce this conditions, else you are in danger to garble the PDOM file. Full
garbage
  collection includes defragmentation.

  Commit points: At any time a user application doing update, delete or insert
operations on a
  PDOM can decide to commit the current status quo of the PDOM. In the commit
operation the
  main file index, normally maintained in main memory, is written back to disk. If
the user
  application crashes, e.g. because of a "disk full" error, the PDOM will be in
the state is was
  immediately before the last successful commit operation when re-opened. Great
care was
  taken to ensure file consistency even after crashes. There is, however, a
minimal chance of
  corrupting a file if the user application dies during a commit operation. Keep
in mind that the
  PDOM does not try to be a fully fledged database.

  Compression with gzip: Optionally a PDOM file can be compressed on the fly using
the gzip
  algorithm. This will result in smaller files, usually half the size of an
uncompressed PDOM file.
  The tradeoff here is speed: a compressed PDOM file usually increases the
execution for
  reading and writing pages by 20%. Compression is a one time decision take at
creating time of
  the PDOM file. A file can not be compressed later. All operations opening PDOM
files will
  automatically recognize compression and handle this fact transparently. User
applications never
  have to care or know about compression when dealing with existing PDOM files.

  Multithreaded access: The same PDOM file can be read by multiple threads in
parallel
  without problem. Update operations block read and write operations for other
threads. Given
  this, all atomic operations on a PDOM file are thread safe. However, composed
update
  operations (e.g. reading a node, modifying it and write back to the PDOM) suffer
from from
  the well known transaction difficulties. To ensure atomicity of complex updates,
the application
  has to synchronize the critical block of code with the PDocument object."

Falko Braeutigam wrote:

> On Sun, 08 Aug 1999, Zvi Avraham wrote:
> > >%_Falko,
> > look at OpenXML - Open Source XQL engine implementation:
> >
> > http://www.openxql.org
> >
> > Can it be incorporated into Ozone OODBMS ?
> > Of course we need first to implement Persistent DOM ...
> Unfortunately I'm very busy with other things at present but yes, we are going
> to use OpenXML to produce the persistent DOM. Currently I'm trying to figure
> out ways to avoid the 'node explosion'. So far I see 2 ways: First, cut the
> DOM tree at a certain depth and re-parse parts when retrieving. Second, do not
> store all nodes as persistent objects but make clusters instead. Any
> other/better ideas?
>
> Actually, I have not yet started the DOM implementation. I want to finish the
> single-VM version of ozone first.
>
> Falko
> --
> ______________________________________________________________________
> Falko Braeutigam                       mailto:falko@softwarebuero.de
> softwarebuero m&b (SMB)                  http://www.softwarebuero.de

--
-------------------------------------------------
Zvi Avraham, Senior Software Engineer
NetManage Inc., Visual Connectivity Division
http://www.netmanage.com/products/visual_conn.asp

begin:vcard 
n:Avraham;Zvi
tel;cell:+972-52-837908
tel;fax:+972-3-5788752
tel;home:+972-4-8551158
tel;work:+972-3-5788753
x-mozilla-html:FALSE
url:http://www2.netmanage.co.il/~zvia
org:NetManage, Inc.;Visual Connectivity Division<BR><A href="/ozone-users/1999/http://www.netmanage.com"><IMG src="http://www.netmanage.com/images/newhead-l.gif" WIDTH="129" HEIGHT="63" ALT="NetManage" BORDER="0"></A><A href='http://www.netmanage.com/products/visual_conn.asp'><IMG src='http://www.netmanage.com/images/vc_middle.gif' WIDTH=227 HEIGHT=63 ALT='NetManage Visual Connectivity Group' border=0></A>
adr:;;;;;;
version:2.1
email;internet:zvia@netmanage.co.il
title:Senior Software Engineer
note:I beleive I can fly !
fn:Zvi Avraham
end:vcard