On May 6, 10:21 pm, John Nagle wrote:
> On 5/4/2012 12:14 AM, Steve Howell wrote:
>
> > On May 3, 11:59 pm, Paul Rubin wrote:
> >> Steve Howell writes:
> >>> compressor = zlib.compressobj()
> >>> s = compressor.compress("foobar")
> >>> s += compressor.flush(zlib.Z_SYNC_FLUSH)
>
>
John Nagle writes:
>That's awful. There's no point in compressing six characters
> with zlib. Zlib has a minimum overhead of 11 bytes. You just
> made the data bigger.
This hack is about avoiding the initialization overhead--do you really
get 11 bytes after every SYNC_FLUSH? I do remember
On 5/4/2012 12:14 AM, Steve Howell wrote:
On May 3, 11:59 pm, Paul Rubin wrote:
Steve Howell writes:
compressor = zlib.compressobj()
s = compressor.compress("foobar")
s += compressor.flush(zlib.Z_SYNC_FLUSH)
s_start = s
compressor2 = compressor.copy()
That's a
On Friday, 4 May 2012 16:27:54 UTC+1, Steve Howell wrote:
> On May 3, 6:10 pm, Miki Tebeka wrote:
> > > I'm looking for a fairly lightweight key/value store that works for
> > > this type of problem:
> >
> > I'd start with a benchmark and try some of the things that are already in
> > the standa
On 5/4/2012 12:49 PM Tim Chase said...
On 05/04/12 14:14, Emile van Sebille wrote:
On 5/4/2012 10:46 AM Tim Chase said...
I hit a few snags testing this on my winxp w/python2.6.1 in that getsize
wasn't finding the file as it was created in two parts with .dat and
.dir extension.
Hrm...must be
On 05/04/12 14:14, Emile van Sebille wrote:
> On 5/4/2012 10:46 AM Tim Chase said...
>
> I hit a few snags testing this on my winxp w/python2.6.1 in that getsize
> wasn't finding the file as it was created in two parts with .dat and
> .dir extension.
Hrm...must be a Win32 vs Linux thing.
> Als
On 5/4/2012 10:46 AM Tim Chase said...
I hit a few snags testing this on my winxp w/python2.6.1 in that getsize
wasn't finding the file as it was created in two parts with .dat and
.dir extension.
Also, setting key failed as update returns None.
The changes I needed to make are marked below.
On 05/04/12 12:22, Steve Howell wrote:
> Which variant do you recommend?
>
> """ anydbm is a generic interface to variants of the DBM database
> — dbhash (requires bsddb), gdbm, or dbm. If none of these modules
> is installed, the slow-but-simple implementation in module
> dumbdbm will be used.
>
On 05/04/12 10:27, Steve Howell wrote:
> On May 3, 6:10 pm, Miki Tebeka wrote:
>>> I'm looking for a fairly lightweight key/value store that works for
>>> this type of problem:
>>
>> I'd start with a benchmark and try some of the things that are already in
>> the standard library:
>> - bsddb
>> -
On May 3, 6:10 pm, Miki Tebeka wrote:
> > I'm looking for a fairly lightweight key/value store that works for
> > this type of problem:
>
> I'd start with a benchmark and try some of the things that are already in the
> standard library:
> - bsddb
> - sqlite3 (table of key, value, index key)
> -
Steve Howell writes:
>> You should be able to just get the incremental bit.
> This is fixed now.
Nice.
> It it's in the header, wouldn't it be part of the output that comes
> before Z_SYNC_FLUSH?
Hmm, maybe you are right. My version was several years ago and I don't
remember it well, but I hal
On May 4, 1:01 am, Paul Rubin wrote:
> Steve Howell writes:
> > Makes sense. I believe I got that part correct:
>
> > https://github.com/showell/KeyValue/blob/master/salted_compressor.py
>
> The API looks nice, but your compress method makes no sense. Why do you
> include s.prefix in s and the
Steve Howell writes:
> Makes sense. I believe I got that part correct:
>
> https://github.com/showell/KeyValue/blob/master/salted_compressor.py
The API looks nice, but your compress method makes no sense. Why do you
include s.prefix in s and then strip it off? Why do you save the prefix
and
On May 3, 11:59 pm, Paul Rubin wrote:
> Steve Howell writes:
> > compressor = zlib.compressobj()
> > s = compressor.compress("foobar")
> > s += compressor.flush(zlib.Z_SYNC_FLUSH)
>
> > s_start = s
> > compressor2 = compressor.copy()
>
> I think you also want to make a decompr
On Thu, May 3, 2012 at 11:03 PM, Paul Rubin wrote:
> > Sort of as you suggest, you could build a Huffman encoding for a
> > representative run of data, save that tree off somewhere, and then use
> > it for all your future encoding/decoding.
>
> Zlib is better than Huffman in my experience, and Py
Steve Howell writes:
> compressor = zlib.compressobj()
> s = compressor.compress("foobar")
> s += compressor.flush(zlib.Z_SYNC_FLUSH)
>
> s_start = s
> compressor2 = compressor.copy()
I think you also want to make a decompressor here, and initialize it
with s and then clone it
On May 3, 11:03 pm, Paul Rubin wrote:
> Steve Howell writes:
> > Sounds like a useful technique. The text snippets that I'm
> > compressing are indeed mostly English words, and 7-bit ascii, so it
> > would be practical to use a compression library that just uses the
> > same good-enough encoding
Steve Howell writes:
> Sounds like a useful technique. The text snippets that I'm
> compressing are indeed mostly English words, and 7-bit ascii, so it
> would be practical to use a compression library that just uses the
> same good-enough encodings every time, so that you don't have to write
> t
On May 3, 9:38 pm, Paul Rubin wrote:
> Steve Howell writes:
> > My test was to write roughly 4GB of data, with 2 million keys of 2k
> > bytes each.
>
> If the records are something like english text, you can compress
> them with zlib and get some compression gain by pre-initializing
> a zlib dict
Steve Howell writes:
> My test was to write roughly 4GB of data, with 2 million keys of 2k
> bytes each.
If the records are something like english text, you can compress
them with zlib and get some compression gain by pre-initializing
a zlib dictionary from a fixed english corpus, then cloning it
On May 3, 1:42 am, Steve Howell wrote:
> On May 2, 11:48 pm, Paul Rubin wrote:
>
> > Paul Rubin writes:
> > >looking at the spec more closely, there are 256 hash tables.. ...
>
> > You know, there is a much simpler way to do this, if you can afford to
> > use a few hundred MB of memory and you d
> I'm looking for a fairly lightweight key/value store that works for
> this type of problem:
I'd start with a benchmark and try some of the things that are already in the
standard library:
- bsddb
- sqlite3 (table of key, value, index key)
- shelve (though I doubt this one)
You might find that f
On 5/3/2012 10:42, Steve Howell wrote:
On May 2, 11:48 pm, Paul Rubin wrote:
Paul Rubin writes:
looking at the spec more closely, there are 256 hash tables.. ...
You know, there is a much simpler way to do this, if you can afford to
use a few hundred MB of memory and you don't mind some loa
On May 2, 11:48 pm, Paul Rubin wrote:
> Paul Rubin writes:
> >looking at the spec more closely, there are 256 hash tables.. ...
>
> You know, there is a much simpler way to do this, if you can afford to
> use a few hundred MB of memory and you don't mind some load time when
> the program first st
Paul Rubin writes:
>looking at the spec more closely, there are 256 hash tables.. ...
You know, there is a much simpler way to do this, if you can afford to
use a few hundred MB of memory and you don't mind some load time when
the program first starts. Just dump all the data sequentially into a
Steve Howell writes:
> Doesn't cdb do at least one disk seek as well? In the diagram on this
> page, it seems you would need to do a seek based on the value of the
> initial pointer (from the 256 possible values):
Yes, of course it has to seek if there is too much data to fit in
memory. All I'm
On May 2, 8:29 pm, Paul Rubin wrote:
> Steve Howell writes:
> > Thanks. That's definitely in the spirit of what I'm looking for,
> > although the non-64 bit version is obviously geared toward a slightly
> > smaller data set. My reading of cdb is that it has essentially 64k
> > hash buckets, so
On May 2, 2012, at 10:14 PM, Steve Howell wrote:
> This is slightly off topic, but I'm hoping folks can point me in the
> right direction.
>
> I'm looking for a fairly lightweight key/value store that works for
> this type of problem:
>
> ideally plays nice with the Python ecosystem
> the data
Steve Howell writes:
> Thanks. That's definitely in the spirit of what I'm looking for,
> although the non-64 bit version is obviously geared toward a slightly
> smaller data set. My reading of cdb is that it has essentially 64k
> hash buckets, so for 3 million keys, you're still scanning throug
On May 2, 7:46 pm, Paul Rubin wrote:
> Steve Howell writes:
> > keys are file paths
> > directories are 2 levels deep (30 dirs w/100k files each)
> > values are file contents
> > The current solution isn't horrible,
>
> Yes it is ;-)
> > As I mention up top, I'm mostly hoping folks can poin
On 05/02/12 21:14, Steve Howell wrote:
> I'm looking for a fairly lightweight key/value store that works for
> this type of problem:
>
> ideally plays nice with the Python ecosystem
> the data set is static, and written infrequently enough that I
> definitely want *read* performance to trump a
On 5/2/2012 10:14 PM, Steve Howell wrote:
This is slightly off topic, but I'm hoping folks can point me in the
right direction.
I'm looking for a fairly lightweight key/value store that works for
this type of problem:
ideally plays nice with the Python ecosystem
the data set is static, an
Steve Howell writes:
> keys are file paths
> directories are 2 levels deep (30 dirs w/100k files each)
> values are file contents
> The current solution isn't horrible,
Yes it is ;-)
> As I mention up top, I'm mostly hoping folks can point me toward
> sources they trust, whether it be ot
This is slightly off topic, but I'm hoping folks can point me in the
right direction.
I'm looking for a fairly lightweight key/value store that works for
this type of problem:
ideally plays nice with the Python ecosystem
the data set is static, and written infrequently enough that I
definitel
34 matches
Mail list logo