[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: XML Repository



On Wed, 08 Dec 1999, Zvi Avraham wrote:
> Falko Braeutigam wrote:
> 
> > On Wed, 08 Dec 1999, Zvi Avraham wrote:
> > > Falko Braeutigam wrote:
> > >
> > > [snip]
> > >
> > > > Here is the current Java interface pasted right out of Lars' editor ;)
> > > >
> > > > public interface Repository extends OzoneRemote {
> > > >     public void init (String parserName)  throws Exception;
> > > >     public Document getRootDocument ();
> > > >     public void setRootDocument (byte[] data) throws Exception;
> > > >     public Document storeDocument (byte[] data, String docName, int access) throws Exception;
> > > >     public Document storeDocument (Document memDoc, String docName, int access) throws Exception;
> > > >     public void linkDocument (Element from, Document to);
> > > >     public Document unlinkDocument (Element from);
> > > >     public NodeList xpathQuery (Document pDom);
> 
> whats this mean ? query for what ?
The idea was to use the same XQuery object for different subsequent queries to
allow in-XPath optimization. Does this make sence? We are not sure about that.

> 
> > > >     public NodeList xpathQuery (String qString, Document pDom);
> > > >     public void addDocument (byte[] data, Element where, int access);
> > > >     public void deleteDocument (Element where);
> > > >     public String getParserName ();
> > > >     public void setParserName (String parserName) throws Exception;
> > > >     }
> 
> opps, what about renameDocument ? is it possible ?
Yes, if we decide to have the possibility to name documents then of course we
need the renameDocument() method. Thanks for this point, Zvi;) But right now I'm
not sure about document names. We can access document using XPath queries. Are
document names needed then?

> also still didn't how I build nested documents, something like directories in filesystem ?
You can build nested document via XLinks. In my example there is the top
document and maybe many nested documents that are linked from the ISO and the
DIN node.

To build such an repository you would write something like:

Document root = repository.storeDocument (doc, name, access);
(where doc is the DOM of the XML that represents the TOP->ISO/DIN tree)
Document nested = repository.storeDocument (nestedDoc);
repository.linkDocument (DIN_node, nested);

Maybe it's a good idea to add a method addDocument() that stores and links a
nested document.

> 
> > >
> > > I think that repository must be Virtual Filesystem, with hierarchical directories,
> > > where each file can be or XML Document or BLOB (we have support for both in Ozone).
> >
> > Hm.. I don't know. The current design is based on the idea to use XML query
> > langs (XPath for now) to search content in the database. Consider the
> 
> > following. There is one "access path" document in the database that describes
> > the structure of the database. Something like:
> >
> > Top --- ISO
> >      |
> >       - DIN
> >
> > ISO and DIN are special XLink containers (provided by the ozone Repository
> > module) that contains links to many other documents that in turn are "real" XML
> > documents.
> 
> in eXcelon by defalult the XQL query includes also path to the document,
> so, if for example, you looking for all authors in all documents in /TOP/DIN directory
> 
> /TOP/DIN//author
> 
> or all authors in entire XML Store:
> 
> //auhors
Correct, this would not be possible. You would have to split the query into:
"//" and "/autor" and the code below.

We could add a special "link" character to the XPath that can be detected by a
preprocessor. The preprocessor could then split up the query string and perform
the code that is needed to follow the link and it would be possible to write
the following XPath query: "//:author". Where ":" is the special link
character. But then it is not pure XPath. Another idea is an ozone specific
XPath implementation but this seems to be too much for now.

> 
> of course you still can implicitly specify dcoument, where query must be applied.
> 
> > To search all ISO's and DIN's we could write:
> >
> > NodeList nl = repository.xpathQuery ("/Top//", repository.rootDocument());
> > for (int i=0; nl.getLength(); i++) {
> >         Document linkedDoc = new XLink (nl.item(i).href();
> >         NodeList nl2 = repository.xpathQuery (...);
> >         for (int j=0; nl2.length(); j++) {
> >                 doSomething (<nodes of nl2>);
> >         }
> > }
> >
> > We need only one paradigm to search any data in the repository. The first XPath
> > query can be seen as filesystem access. This solution is much more flexible
> > and XML'ish than a new virtual filesystem API, I think. Also, it should be no
> > problem to integrate BLOBs into this architecture. All we need is a special
> > XML-BLOB node much like the XLink node. Then you can mixed store XML documents,
> > BLOBs and any other data you want to have in your repository. And all this
> > accessible via uniform XML query technologies.
> >
> > What do you think?
> 
> I think that your solution is more elegant than Virtual Filesystem.
> I just doubt how it will scale ? eXcelon for example have problems with large documents,
> so, Virtual Filesystem solution help them to keep documents relatively small - many small PDOM documents,
> instead of one big PDOM, like in Ozone XML Repository proposal.
In fact, the ozone repository consists of many small documents that are linked
using XLink. (see XPath problem)

> Falko, how do you plan to solve this scalability issues ?
All I can do is to give the ozone core a good architecture. The last
limitation of the current ozone is the number of objects that are created
within one transaction. It seems to be no problem to solve this. On the other
hand we still have a BIG disk overhead. I'm working on this as described in my
status report. Once these things are done ozone should have virtually no
limitations. That is, no 16M limit for XML documents like ObjectStore ;) But if
the performance will be suitable for real-world applications is a completely
different story! Actually, I don't know.

About scaling: I did not test it yet. I do not have access to a multi-processor
system and the distribution features are not yet there. But the ozone core is
thread-based and I try very hard to make synchronization as smart as possible
so I hope it will scale well on multi-processors.


Falko
-- 
______________________________________________________________________
Falko Braeutigam                         mailto:falko@softwarebuero.de
softwarebuero m&b (SMB)                    http://www.softwarebuero.de