On Fri, 2009-09-04 at 13:41 -0700, Richard Elling wrote:
> On Sep 4, 2009, at 12:23 PM, Len Zaifman wrote:
> 
> > We have groups generating terabytes a day of image data  from lab  
> > instruments and saving them to an X4500.
> 
> Wouldn't it be easier to compress at the application, or between the
> application and the archiving file system?

Preamble:  I am actively doing research into image set compression,
specifically jpeg2000, so this is my point of reference.


I think it would be easier to compress at the application level. I would
suggest getting the image from the source, then use lossless jpeg2000
compression on it, saving the result to an uncompressed ZFS pool.

JPEG2000 uses arithmetic encoding to do the final compression step.
Arithmetic encoding has a higher compression rate (in general) than
gzip-9, lzbj or others.  There is an opensource implementation of
jpeg2000 called jasper[1].  Jasper is the reference implementation for
jpeg2000, meaning that all other jpeg2000 programs must verify it's
output to that of jasper (kinda).

Saving the jpeg2000 image to an uncompressed ZFS partition will be the
fastest thing.  Since jpeg2000 is already compressed, trying to compress
it will not yeild any storage space reduction, in fact it may _increase_
the size of the data stored on disk.  Since good compression algorithms
result in random data you can see why running on a compressed pool would
be bad for performance.

[1] http://www.ece.uvic.ca/~mdadams/jasper

On a side note, if you want to know how Arithmetic encoding works,
Wikipedia[2] has a real nice explanation.  Suffice it to say, in theory
( Without considering implementation details ) arithmetic encoding can
encode _any_ data at the rate of data_entropy*num_of_symbols +
data_symbol_table. In practice this doesn't happen due to floating point
overflows and some other issues.

[2] http://en.wikipedia.org/wiki/Arithmetic_coding

-- 
Louis-Frédéric Feuillette <jeb...@gmail.com>

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to