Bellow, follows a proposal for a new opensolaris project. Of course,
this is open to change since I just wrote down some ideas I had months
ago, while researching the topic as a graduate student in Computer
Science, and since I'm not an opensolaris/ZFS expert at all. I would
really appreciate any suggestion or comments.

PROJECT PROPOSAL: ZFS Compression Algorithms.

The main purpose of this project is the development of new
compression schemes for the ZFS file system. We plan to start with
the development of a fast implementation of a Burrows Wheeler
Transform based algorithm (BWT). BWT is an outstanding tool
and the currently known lossless compression algorithms
based on it outperform the compression ratio of algorithms derived from the well
known Ziv-Lempel algorithm, while being a little more time and space
expensive. Therefore, there is space for improvement: recent results
show that the running time and space needs of such algorithms can be
significantly reduced and the same results suggests that BWT is
likely to become the new standard in compression
algorithms[1]. Suffixes Sorting (i.e. the problem of sorting suffixes of a
given string) is the main bottleneck of BWT and really significant
progress has been made in this area since the first algorithms of
Manbers and Myers[2] and Larsson and Sadakane[3], notably the new
linear time algorithms of Karkkainen and Sanders[4]; Kim, Sim and
Park[5] and Ko e aluru[6] and also the promising O(nlogn) algorithm of
Karkkainen and Burkhardt[7].

As a conjecture, we believe that some intrinsic properties of ZFS and
file systems in general (e.g. sparseness and data entropy in blocks)
could be exploited in order to produce brand new and really efficient
compression algorithms, as well as the adaptation of existing ones to
the task. The study might be extended to the analysis of data in
specific applications (e.g. web servers, mail servers and others) in
order to develop compression schemes for specific environments and/or
modify the existing Ziv-Lempel based scheme to deal better with such
environments.

[1] "The Burrows-Wheeler Transform: Theory and Practice". Manzini,
Giovanni. Proc. 24th Int. Symposium on Mathematical Foundations of
Computer Science

[2] "Suffix Arrays: A New Method for
On-Line String Searches". Manber, Udi and Myers, Eugene W..  SIAM
Journal on Computing, Vol. 22 Issue 5. 1990

[3] "Faster suffix sorting". Larsson, N Jasper and Sadakane,
Kunihiko. TECHREPORT, Department of Computer Science, Lund University,
1999

[4] "Simple Linear Work Suffix Array Construction". Karkkainen, Juha
and Sanders,Peter. Proc. 13th International Conference on Automata,
Languages and Programming, 2003

[5]"Linear-time construction of suffix arrays" D.K. Kim, J.S. Sim,
H. Park, K. Park, CPM, LNCS, Vol. 2676, 2003

[6]"Space ecient linear time construction of sux arrays",P. Ko and
S. Aluru, CPM 2003.

[7]"Fast Lightweight Suffix Array Construction and
Checking". Burkhardt, Stefan and Kärkkäinen, Juha. 14th Annual
Symposium, CPM 2003,


Domingos Soares Neto
University of Sao Paulo
Institute of Mathematics and Statistics
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to