Database Backends

Christopher Browne Sat, 1 Jan 2000 17:48:59 -0800
On 01 Jan 2000 14:17:18 CST, the world broke into rejoicing as
Rob Browning <[EMAIL PROTECTED]>  said:
> [EMAIL PROTECTED] writes:
> 
> > An SQL backend is needed for business users but is scary to think about
> > the management headaches for a novice/computer newbie.  For the home
> > user, alternate possibilities are:
> > -- using dbm
> > -- using dBase/foxbase style file formats. 
> 
> Does anyone know what the current state of the regular dbms is?  I
> haven't worked with them too much, but it looks like they implement a
> standard flat hash table of key value pairs where you can have
> whatever you want for the key and whatever you want for the values.  I
> presume that we'd be talking about building our data store on top of a
> set of these.

I suggest considering B-trees instead. 

- Hashing provides O(1) access *when you know the precise key,* which is a
  good thing.  This may still make it the way to go in storing
  transactions; if each transaction has a fixed key, you search by key,
  and get fast access.

- BUT.  Hashing is *HORRIBLE* if you have to do any guessing; any
  operation other than [look for key X] winds up behaving with O(n).

The major alternative is B-Trees. 

a) Consider the dBase model.
It involves having data in fixed length records, sequentially arranged
in a file.  That's the .dbf file.

To access it efficiently, there are .ndx files that contain B-Trees
(of some flavor; I'm willing to not bother trying to distinguish between
B-trees, B+-trees, and B*-trees if you're not worried about it...).

So, if looking for a value, you search the B-tree.  You hit a leaf, and
that leaf contains a pointer indicating the position of the record in
the .dbf file.

Net result: a dBase database system consists of a directory with several
.dbf files containing data, and some larger number of .ndx files providing
access paths from keys to data.

b) C-ISAM.

This is an X/Open standard, developed by Informix.  It uses a similar
model to dBase, where you have data files and .idx files, where the .idx
files are indices.

There are several implementations available that run on Linux, including
Informix C-ISAM, something called D-ISAM, and probably some others.  None
that are freely available, regrettably.

The entertaining bit is that Informix SE, the version of Informix that was
first available on Linux, happens to implement an SQL engine on top of
C-ISAM.  A similar parallel is that MySQL was originally an ISAM database
system, and the creators "made it SQL" by building an SQL interpretation
layer on top.

I'll barely *anything* about the "newer generation" RDBMSes; they wind
up implementing what amounts to their own filesystem for data storage,
and this is every bit as complex as something like ext2.  I don't think
we want to go there...

> If we're going in this direction, it would be nice to know what their
> performance characteristics are like so that we'd know how expensive
> some of the things we'd need to to would be.

I suspect that the ideal choice of database is probably Sleepycat DB.
a) If you're using GLIBC 2.1, that's what GLIBC uses for hash tables.
b) It's free for use with free software.
c) It's readily available for many platforms.
d) It supports both hash tables *and* B-trees, which means that we
   can get both O(1) behaviour in those cases where we can satisfy the
   "needs" of hashing, and O(log N) behaviour for more approximate
   searching.
e) It's got some of the "good stuff" that we'd likely like, such as
   locking with various levels of granularity, nested transactions,
   two phase commit, logging, rollback...

It *apparently* can support having multiple tables in one file, which
I'll have to look into more closely.  (It used to be like a Perl/awk
association list, where you might be able to put multiple things in
one table by having distinct ranges of keys; I was recently told that
they do better than that now...)

This should be reasonably lightweight.

The thing that makes SQL a bit of a "horror" that we wouldn't want to
push into the GnuCash client is that the use of SQL introduces an extra
interpretation layer, and thus a somewhat amorphous layer that might
add substantial time to queries.  Throw on with that the fact that
*most* SQL implementations involve significant amounts of system
management, and it makes it pretty prohibitive.

If there were a "libre" SQL database that was targetted at "embedded"
applications, that could be worth looking at, despite that "amorphous
interpretation layer."  There isn't one at this point.  Not MySQL; not
MSQL; definitely not PostgreSQL.  Supposing Inprise "open sources"
Interbase, *that* would be a system worth looking at as a DBMS designed
for embedding.

Until then, something like Sleepycat DB seems by far most suitable.
--
This computer thanks you for your attention.
G O O D B Y E
[EMAIL PROTECTED] - <http://www.ntlug.org/~cbbrowne/nonrdbms.html>

--
Gnucash Developer's List 
To unsubscribe send empty email to: [EMAIL PROTECTED]
Database Backends

Reply via email to