[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: New User Queries



On Sun, 26 Dec 1999, William Uther wrote:
> Hi,
> 
>   Ozone looks like a nice system.  I have a project which I think could do
> with a nice ODBMS back end.  I'm looking at both Ozone and storedObjects
> <http://www.jdbms.org/>, and I'm not yet sure which is more appropriate.  I
> have never used an ODBMS before and I only really started looking into this
> today, so please forgive the newbie questions.
> 
>   If possible, using a generic interface to the ODBMS would be nice.  I
> downloaded the ODMG 3.0 spec.  It looks nice reasonable, but there don't
> seem to be any open source implementations of it :).
I have to get some ODMG stuff working by the end of January.

>  I looked at the ozone
> ODMG stuff in org.ozoneDB.odmg and it seems to use the JDK 1.2 collections.
> Did you know that there is a JDK1.2 compatible collections library that
> works with JDK1.1 at http://java.sun.com/beans/infobus/ (down the bottom of
> that page)?  You should be able to make it work with just a few switched
> import statements if you want to stay with java 1.1.  It is what the java
> 1.1 version of the ODMG spec uses.
Thanks for the hint. But it seems that we will not use Java collections to
implement ODMG collections. Yes, this is the fastest way to get something
working but this solutions would not scale as expected from a database
solution. Let me explain: Java collections are made to run in-memory and fit
into the Java heap. Nothing wrong with this because the number of objects
in general is limited by the available Java heap. An OODBMS breaks this limit.
The number of objects in general is no longer limited by the size of the Java
heap. Therefore the capacity of a database collection should not be limited by
the Java heap too. In other words: IMHO its not a good idea to use "simple"
in-memory collections as the access path to database objects.

> 
>   A few questions:
> 
>   - How much space overhead does Ozone have for an object on the disk?
Highly (!) depends on the application domain. In general, with the current
implementation the overhead is bigger for smaller objects. I address this
problem with the new memory management architecture.

> 
>   - I read (in the 'Ozone Whitepaper' thread) thread that ozone keeps a
> hash of all objects in the database.  This would seem to be very large,
> larger than I'd like to have in RAM.  Does ozone keep all objects in a
> Hashtable, or only objects in the cache (ObjectSpace?).
All object in one hashtable. But this uses the DxDiskHashtable from the DxLib
module. As the name implies, parts of the table can be filed out. This will not
change with the new architecture I'm currently working on. I'm trying to let
the memory complexity of ozone "as independent as possible" from the number
of touched objects. In other word, it should be possible to handle 1M objects
on a 16MB server, regardless of the speed of course ;)

> 
>   - What is actually persistant?  e.g. in GarageImpl.java there is a
> reference to a DxMap.  I assume that DxMap and everything in it is
> persistant.  A comment in the tutorial says you could replace this with a
> java.util.Hashtable and it would still work.
yes.

>   Put another way, the 'objects' you place in the DB define the interface
> to the data in the DB, however more is stored than just the primitive types
> in that class.  The DB 'objects' are root objects for a graph of persistant
> data.
exactly.

>  (I'll use DB 'objects' to describe these instances of the special
> modified classes (e.g. Car), as opposed to normal java objects.)
The classes itself are not modified. You use Java interfaces to describe the
interface of the database.

> 
>   - If that is right, I assume that normal garbage collection applies to
> the normal java objects hanging off a DB 'object'?
Yes, the GC is used by ozone to "passivate" objects that are parts of database
objects. However, this is transparent to the programmer.

> 
>   - Is any object referenced by multiple DB 'objects' stored separately,
> possibly multiple times?
Yes, a database object lives in its own independent universe of non-database
objects. Database objects cannot share normal Java references! If a database
object receives an object (its reference) as an argument of a method call, then
it can only use its "value" because this argument object will sooner or later be
copied by the serialization.

> 
>   - You can also reference another DB 'object' in a DB 'object'.  In this
> case, the other DB 'object' is not garbage collected if you lose all
> references because it is its own GC root.  right?
correct!

> 
>   - In ozone, is each call to a DB 'object' method a single transaction?
> (and that method must be marked as 'updating' if it causes any of the graph
> hanging off the DB 'object' to change?)
Each "external" call (not each in-server call) is a transaction, yes.

> 
>   - If one DB 'object' calls a method on another DB 'object' what happens
> about locking, etc.  Does this mean that Ozone has 'nested transactions'
> like the storedObjects ODBMS?
I answered this before: not each in-server call is one transaction. No, ozone
does not support nested transactions. 

If a read-only method is called, then a read lock is acquired. If this method
is marked "update", then a write lock is acquired too. ozone uses pessimistic
locking.

> 
>   - In ozone, all the non DB 'objects' hanging off a DB 'object' are loaded
> into memory when that DB 'object' is loaded.  correct?
yes.

> 
>   - Are the storage requirements very different for DB objects vs. other
> objects hanging off a DB object?
ozone uses Java serialization to store database and "other" objects. But each
database object need extra information to be stored (name, transaction state,
shadow, owner...). The current implementation need about 200 bytes per database
object.

> 
>   Now let me briefly descibe the project I have in mind.  I want to make
> the following algorithm persistant:
> 
>   - begin write transation
>     - user supplies a large chunk of data (50k)
>     - data is broken into many small chunks (2 bytes, + a couple of
> references)
>     - small chunks are entered in a hash-table as they are generated.
> Small chunks from previous transactions are also in the hash-table.  (Being
> able to find small chunks fast is integral to the write op.)
>     - small chunks and hash table are persistant.
>   - end write transaction
> 
>   - begin read transaction
>     - user supplies a reference which tells the algorithm which small
> chunks to grab and reconsitiute into a larger chunk of data.
>     - large chunk is returned to user
>   - end read transaction
> 
>   Because of the transaction structure it would seem that I'd have one DB
> 'object' that to implement the algorithm (which means that all the work
> happens inside the server as mentioned in the "Great!" thread, right?).
yes.

>   If I just left things like that, then technically it should all work.
> I'd have a database with only one 'object', but lots of data hanging off
> it.  I imagine the data stored getting rather large - about 5 000 000 of
> the small chunks hanging off my single object.
Then the small chunks have to be database objects. There is no way around that,
if you want to use ozone as it is. Ok, your small chunk objects are very small
but we ran into the same problems with our persistent DOM implementation. And I
have to solve this with the new memory management architecture. So go on, make
them database objects!... I will try to solve the memory overhead problems :)

 > 
>   I don't think that arrangement would fly, if for no other reason than
> that the hash-table is going to be larger than I want to leave sitting in
> RAM.  So, can I use the DB instead of the Hash-table?  This would mean
> making each small chunk its own DB 'object' and doing my own memory
> management.  It also means I cannot use arbitary java objects as the key
> for the hash-table.  I would have to generate unique strings instead of
> using the objects I was using as keys before.  Is this why ODMG has a
> hash-table in their DB interface?  Does ozone have any equivalent (i.e. a
> hash-table that can key on an object (DB or not) and not have to be all in
> memory at once)?
> 
>   How is the mapping from DB 'object' names to DB 'objects' handled?  I
> assume it uses something like the org.ozoneDB.DxLib.DxDiskHashMap?  (If not
> there'd be no advantage of DB 'objects' and names over simply having a
> Hash-table in a single object, correct?)  What happens when two objects in
> the DB have the same name?
Names are unique. If you try to use a name that is already in use, then you get
an exception.

> 
>   I think that's all my questions for the moment.  Thanks for listening,
> just writing this out has been helpful.

Hope this helps a bit. Don't hesitate to come back with more questions.


Falko
-- 
______________________________________________________________________
Falko Braeutigam                         mailto:falko@softwarebuero.de
softwarebuero m&b (SMB)                    http://www.softwarebuero.de