On 8/21/06, Richard Elling - PAE <[EMAIL PROTECTED]> wrote:

I haven't done measurements of this in years, but...
I'll wager that compression is memory bound, not CPU bound, for today's
servers.  A system with low latency and high bandwidth memory will perform
well (UltraSPARC-T1).  Threading may not help much on systems with a single
memory interface, but should help some on systems with multiple memory
interfaces (UltraSPARC-*, Opteron, Athlon FX, etc.)
  -- richard

A rather simple test using CSQamp.pkg from the cooltools download site.  There's nothing magical about this file - it just happens to be a largish file that I had on hand.

$ time gzip -c < CSQamp.pkg > /dev/null

V40z:

real    0m15.339s
user    0m14.534s
sys     0m0.485s

V240:

real    0m35.825s
user    0m35.335s
sys     0m0.284s

T2000:

time gzip -c < CSQamp.pkg > /dev/null

real    1m33.669s
user    1m32.768s
sys     0m0.881s


If I do 8 gzips in parallel:

V40z:

time ~/scripts/pgzip

real    0m32.632s
user    1m53.382s
sys     0m1.653s

V240:

time ~/scripts/pgzip

real    2m24.704s
user    4m42.430s
sys     0m2.305s

T2000:

time ~/scripts/pgzip

real    1m40.165s
user    13m10.475s
sys     0m6.578s


In each of the tests, the file was in /tmp.  As expected, the V40z running 8 gzip processes (using 4 cores)  took twice as long as it did running 1 (using 1 core).  The V240 took 4 times as long (8 processes, 2 threads) as the original, and the T2000 ran 8 (8 processes, 8 cores) in just about the same amount of time as it ran 1.

For giggles, I ran 32 processes on the T2000 and came up with 5m4.585s (real) 158m33.380s (user) and 42.484s (sys).  In other words, the T2000 running 32 gzip processes had an elapsed time of 3 times greater than 8 processes.  Even though the elapsed jumped by 3x, the %sys jumped by nearly 7x.

Here's a summary:

Server  gzips   Seconds MB/sec
V40z    8       32.632   49,445
T2000   32      304.585  21,189
T2000   8       100.165  16,108
V40z    1       15.339   13,149
V240    8       144.704  11,150
V240    1       35.825   5,630
T2000   1       99.669   2,024

Clearly more threads doing compression with gzip give better performance than a single thread.  How that translates into memory vs. CPU speed, I am not sure.  However, I can't help but think that if my file server is compressing every data block that it writes that it would be able to write more data if it used a thread (or more) per core I would come out ahead.

I am a firm believer that the next generation of compression commands and libraries need to use parallel algorithms.  The simplest way to do this would be to divide the data into chunks and farm out each chunk to various worker threads.  This will likely come at the cost of efficiency of the compression, but in intial tests I have done this amounts to a very small difference in size relative to the speedup achieved.  Initial tests were with a chunk of C code and zlib.

Mike

--
Mike Gerdts
http://mgerdts.blogspot.com/
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to