Re: [HACKERS] Compression and on-disk sorting

Jim C. Nasby Fri, 19 May 2006 11:40:30 -0700

On Fri, May 19, 2006 at 09:29:03AM +0200, Martijn van Oosterhout wrote:
> On Thu, May 18, 2006 at 10:02:44PM -0500, Jim C. Nasby wrote:
> > http://jim.nasby.net/misc/compress_sort.txt is preliminary results.
> > I've run into a slight problem in that even at a compression level of
> > -3, zlib is cutting the on-disk size of sorts by 25x. So my pgbench sort
> > test with scale=150 that was producing a 2G on-disk sort is now
> > producing a 80M sort, which obviously fits in memory. And cuts sort
> > times by more than half.
> 
> I'm seeing 250,000 blocks being cut down to 9,500 blocks. That's almost
> unbeleiveable. What's in the table? It would seem to imply that our
> tuple format is far more compressable than we expected.


It's just SELECT count(*) FROM (SELECT * FROM accounts ORDER BY bid) a;
If the tape routines were actually storing visibility information, I'd
expect that to be pretty compressible in this case since all the tuples
were presumably created in a single transaction by pgbench.

If needs be, I could try the patch against http://stats.distributed.net,
assuming that it would apply to REL_8_1.

> Do you have any stats on CPU usage? Memory usage?

I've only been taking a look at vmstat from time-to-time, and I have yet
to see the machine get CPU-bound. Haven't really paid much attention to
memory. Is there anything in partucular you're looking for? I can log
vmstat for the next set of runs (with a scaling factor of 10000). I plan
on doing those runs tonight...
-- 
Jim C. Nasby, Sr. Engineering Consultant      [EMAIL PROTECTED]
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

Re: [HACKERS] Compression and on-disk sorting

Reply via email to