2011.03.23. 19:33 keltezéssel, Dan Stromberg írta:
On Wed, Mar 23, 2011 at 7:37 AM, Laszlo Nagy <gand...@shopzeus.com
<mailto:gand...@shopzeus.com>> wrote:
I was also thinking about storing data in a gdbm database. One
file for each month storing at most 100 log messages for every key
value. Then one file for each day in the current month, storing
one message for each key value. Incremental backup would be easy,
and reading back old messages would be fast enough (just need to
do a few hash lookups). However, implementing a high availability
service around this is not that easy.
I think a slight variation of this sounds like a good bet for you.
But when you "open" a database, create a temporary copy, and when you
close the database, rename it back to its original name. Then your
backups should be able to easily get a self-consistent (if not up to
the millisecond) snapshot.
My idea was to open all database in read-only mode, except the one for
the last day. So it will be possible to archive these files (except the
last day). No need to make copies. I have also developed an algorithm
that merges the database from the "previous day" with the database of
the "month of the previous day". This happens when a day switch occurs.
The algorithm detects this, and it can merge the database and at the
same time, log messages can be added. The service is only suspended two
times a day, when fragmented and defragged database are switched. But
that is only a single "file rename" operation and it takes less than 0.1
seconds to do. So the alg. is ready. I can implement it. But It is not
easy to do it. I can spend many days with it and then it may turn out
that it is not that efficient than I thought.
I cannot believe that others din't run into the same problem. This is
why I posted to the list. I don't want to reinvent the wheel if I don't
need to.
Or did you have some other problem in mind for the gdbm version?
Nope.
BTW, avoid huge directories of course, especially if you don't have
hashed or btree directories. One way is to come up with a longish
hash key (sha?), and use a trie-like structure in the filesystem on
fibonnaci-length chunks of the hash keys becoming directories and
subdirectories.
Hmm that's a good idea. Thanks!
L
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
--
http://mail.python.org/mailman/listinfo/python-list