Bellow, follows a proposal for a new opensolaris project. Of course, this is open to change since I just wrote down some ideas I had months ago, while researching the topic as a graduate student in Computer Science, and since I'm not an opensolaris/ZFS expert at all. I would really appreciate any suggestion or comments.
PROJECT PROPOSAL: ZFS Compression Algorithms. The main purpose of this project is the development of new compression schemes for the ZFS file system. We plan to start with the development of a fast implementation of a Burrows Wheeler Transform based algorithm (BWT). BWT is an outstanding tool and the currently known lossless compression algorithms based on it outperform the compression ratio of algorithms derived from the well known Ziv-Lempel algorithm, while being a little more time and space expensive. Therefore, there is space for improvement: recent results show that the running time and space needs of such algorithms can be significantly reduced and the same results suggests that BWT is likely to become the new standard in compression algorithms[1]. Suffixes Sorting (i.e. the problem of sorting suffixes of a given string) is the main bottleneck of BWT and really significant progress has been made in this area since the first algorithms of Manbers and Myers[2] and Larsson and Sadakane[3], notably the new linear time algorithms of Karkkainen and Sanders[4]; Kim, Sim and Park[5] and Ko e aluru[6] and also the promising O(nlogn) algorithm of Karkkainen and Burkhardt[7]. As a conjecture, we believe that some intrinsic properties of ZFS and file systems in general (e.g. sparseness and data entropy in blocks) could be exploited in order to produce brand new and really efficient compression algorithms, as well as the adaptation of existing ones to the task. The study might be extended to the analysis of data in specific applications (e.g. web servers, mail servers and others) in order to develop compression schemes for specific environments and/or modify the existing Ziv-Lempel based scheme to deal better with such environments. [1] "The Burrows-Wheeler Transform: Theory and Practice". Manzini, Giovanni. Proc. 24th Int. Symposium on Mathematical Foundations of Computer Science [2] "Suffix Arrays: A New Method for On-Line String Searches". Manber, Udi and Myers, Eugene W.. SIAM Journal on Computing, Vol. 22 Issue 5. 1990 [3] "Faster suffix sorting". Larsson, N Jasper and Sadakane, Kunihiko. TECHREPORT, Department of Computer Science, Lund University, 1999 [4] "Simple Linear Work Suffix Array Construction". Karkkainen, Juha and Sanders,Peter. Proc. 13th International Conference on Automata, Languages and Programming, 2003 [5]"Linear-time construction of suffix arrays" D.K. Kim, J.S. Sim, H. Park, K. Park, CPM, LNCS, Vol. 2676, 2003 [6]"Space ecient linear time construction of sux arrays",P. Ko and S. Aluru, CPM 2003. [7]"Fast Lightweight Suffix Array Construction and Checking". Burkhardt, Stefan and Kärkkäinen, Juha. 14th Annual Symposium, CPM 2003, Domingos Soares Neto University of Sao Paulo Institute of Mathematics and Statistics _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss