[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: XML Repository v. 2



Marcel Ruff wrote:
> 
> Why do we need the DTD, why not just ignore it?
> XPath works fine without DTD.

For one thing, XPath's id() function will not work without
the DTD.  For another, accessing the DTD at store time 
ensures that the database is structurally valid.  Perhaps 
at some point in the future incremental updates to the database will 
be validated against the relevant DTD to maintain its integrity.

When XML-Schema is finished, it will also be feasible to do
content constraint validation also, so the matter is worth dealing with.

In general, building an XML system which ignores DTDs,
while obviously possible, feels all wrong.

> In the SUN DOM API there is the method
> 
>    com.sun.xml.tree.XmlDocument.changeNodeOwner(node)
> 
> which allows to merge multiple root nodes (documents) in one
> big tree (you could also copy the nodes recursiv).
> XPath runs fine over this new created tree.
> 
> But this may be a memory problem, to have such a huge
> tree in memory.

I think so; disk yes, memory, well, who really knows?  Perhaps
those with million-document systems will have terabytes of RAM?

If the issue is locating a document by an XPath, it should be 
acceptable to iterate over the documents in the repository.  If
the issue is resolving idrefs or keys in other documents, it
should be acceptable to rely on XSL's document() function.  If you 
don't treat the documents individually in these cases, you force all 
id's to be system-wide unique instead of document-wide unique; this
would likely be a problem.

It is another issue again, though, to create a superdocument of 
lots of fragments any part of which might be referenced by
XPaths which assume the superdocument root as their operating 
basis.  I think a reasonable solution for that is to compile all the 
components of the superdocument into a single tree in the persistent
DOM.
For each top-level node in each entity added to the PDOM, keep track of
the
mapping of node->entity.  Then, to save an arbitrary node in the PDOM to
its external document, simply traverse back up the parents until you
find the 
nearest one which is owned by an entity.  While writing, split the
output 
every time a child node is encountered which is owned by an entity.

 Steve

-- 
----------------------------------------------------------------------
Steve Tinney                                        Babylonian Section
                                 *   University of Pennsylvania Museum
stinney@sas.upenn.edu                          Phila, PA. 215-898-4047