Hi,

No news. I received some very good suggestions, but unfortunately I
didn't get as much discussion as I had hoped it would. I'm sending the
project proposal again. I think that there are a lot of interesting
things to research and develop regarding the subject and I hope this
time we discuss a bit more about it. I would like to point out the
Adam Leventhal's suggestion of an adaptive compression scheme: I think
it would be a challenging and interesting direction to take. Besides,
there are some new results about BWT that I'm sure would be of
interest in this context.


Kind Regards,

Domingos.

Follows the text of my original proposal:

-----------------------------------------------------------------------------------------------------------------

Bellow, follows a proposal for a new opensolaris project. Of course,
this is open to change since I just wrote down some ideas I had months
ago, while researching the topic as a graduate student in Computer
Science, and since I'm not an opensolaris/ZFS expert at all. I would
really appreciate any suggestion or comments.

PROJECT PROPOSAL: ZFS Compression Algorithms.

The main purpose of this project is the development of new
compression schemes for the ZFS file system. We plan to start with
the development of a fast implementation of a Burrows Wheeler
Transform based algorithm (BWT). BWT is an outstanding tool
and the currently known lossless compression algorithms
based on it outperform the compression ratio of algorithms derived from the well
known Ziv-Lempel algorithm, while being a little more time and space
expensive. Therefore, there is space for improvement: recent results
show that the running time and space needs of such algorithms can be
significantly reduced and the same results suggests that BWT is
likely to become the new standard in compression
algorithms[1]. Suffixes Sorting (i.e. the problem of sorting suffixes of a
given string) is the main bottleneck of BWT and really significant
progress has been made in this area since the first algorithms of
Manbers and Myers[2] and Larsson and Sadakane[3], notably the new
linear time algorithms of Karkkainen and Sanders[4]; Kim, Sim and
Park[5] and Ko e aluru[6] and also the promising O(nlogn) algorithm of
Karkkainen and Burkhardt[7].

As a conjecture, we believe that some intrinsic properties of ZFS and
file systems in general (e.g. sparseness and data entropy in blocks)
could be exploited in order to produce brand new and really efficient
compression algorithms, as well as the adaptation of existing ones to
the task. The study might be extended to the analysis of data in
specific applications (e.g. web servers, mail servers and others) in
order to develop compression schemes for specific environments and/or
modify the existing Ziv-Lempel based scheme to deal better with such
environments.

[1] "The Burrows-Wheeler Transform: Theory and Practice". Manzini,
Giovanni. Proc. 24th Int. Symposium on Mathematical Foundations of
Computer Science

[2] "Suffix Arrays: A New Method for
On-Line String Searches". Manber, Udi and Myers, Eugene W.. SIAM
Journal on Computing, Vol. 22 Issue 5. 1990

[3] "Faster suffix sorting". Larsson, N Jasper and Sadakane,
Kunihiko. TECHREPORT, Department of Computer Science, Lund University,
1999

[4] "Simple Linear Work Suffix Array Construction". Karkkainen, Juha
and Sanders,Peter. Proc. 13th International Conference on Automata,
Languages and Programming, 2003

[5]"Linear-time construction of suffix arrays" D.K. Kim, J.S. Sim,
H. Park, K. Park, CPM, LNCS, Vol. 2676, 2003

[6]"Space ecient linear time construction of sux arrays",P. Ko and
S. Aluru, CPM 2003.

[7]"Fast Lightweight Suffix Array Construction and
Checking". Burkhardt, Stefan and Kärkkäinen, Juha. 14th Annual
Symposium, CPM 2003,


Domingos Soares Neto
University of Sao Paulo
Institute of Mathematics and Statistics

and

IBM Software Group.

__________________________________________________________________________


On 10/7/07, roland <[EMAIL PROTECTED]> wrote:
> any news on additional compression-schemes for zfs ?
>
> this is interesting research-topic, imho :)
>
> so, some more real-world tests with zfs-fuse + lzo patch :
>
> -LZO------------------------------------------------
> zfs set compression=lzo mypool
>
> time cp /vmware/vserver1/vserver1.vmdk /mypool
>
> real    7m8.540s
> user    0m0.708s
> sys     0m24.839s
>
> zfs get compressratio mypool
> NAME    PROPERTY       VALUE   SOURCE
> mypool  compressratio  1.74x   -
>
> 1.7G    vserver1.vmdk  compressed
> 3.0G    vserver1.vmdk  uncompressed
>
> -LZJB------------------------------------------------
> zfs set compression=lzjb mypool
>
> time cp /vmware/vserver1/vserver1.vmdk /mypool
>
> real    7m16.392s
> user    0m0.709s
> sys     0m25.107s
>
> zfs get compressratio mypool
> NAME    PROPERTY       VALUE   SOURCE
> mypool  compressratio  1.47x   -
>
> 2.0G    vserver1.vmdk compressed
> 3.0G    vserver1.vmdk uncompressed
>
> -GZIP------------------------------------------------
> zfs set compression=gzip mypool
>
> time cp /vmware/vserver1/vserver1.vmdk /mypool/
>
> real    12m54.183s
> user    0m0.653s
> sys     0m24.933s
>
> zfs get compressratio
> NAME    PROPERTY       VALUE   SOURCE
> mypool  compressratio  2.02x   -
>
> 1.5G    vserver1.vmdk    compressed
> 3.0G    vserver1.vmdk    uncompressed
>
>
> btw - lzo-patch for zfs-fuse (does apply to latest zfs-fuse sources) is at 
> http://groups.google.com/group/zfs-fuse/attach/a489f630aa4aa189/zfs-lzo.diff.bz2?part=4
>
>
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to