Hi,

> It might be interesting to focus on compression algorithms which are
> optimized for particular workloads and data types, an Oracle database for
> example.

Yes, I agree. That is what I meant when I said "The study might be
extended to the analysis of data in specific applications (e.g. web
servers, mail servers and others) in order to develop compression
schemes for specific environments...". However, I was not considering
it as a major task, but a minor one. How important such a feature
would be to opensolaris?


> It might be worthwhile to have some sort of adaptive compression whereby
> ZFS could choose a compression algorithm based on its detection of the
> type of data being stored.


  That's definitely a great idea. I'm just afraid that would be a bit
hard to identify the data type of a given block or set of blocks in
order to adapt the compression algorithm to it. At the file level it
would be pretty easy in most cases, but at the block level we don't
have a clue about what kind of data are inside the block. The
identification process would depend on some statistical properties of
the data and I don't know how hard it would be to scan the blocks and
process them on a reasonable amount of time, and the whole thing must
be done before the compression really starts.
 Therefore, prior to the development of an adaptative scheme, it's
very important to know if such a thing would be worthwhile. In other
words, we need to know the relative variation of the block's
compression ratio while compressed by different algorithms: is it true
that block B1 can be significantly better compressed by algorithm X
than by algorithm Y, while the inverse happens with block B2? It makes
sense, but I wouldn't say it's true before a deeper analysis and some
experiments.
 Summarizing, I think the idea of an adaptative scheme very nice and
interesting. :-)

Domingos.

--
Domingos Soares Neto
University of Sao Paulo
Institute of Mathematics and Statistics








> Adam
>
> On Thu, Jul 05, 2007 at 08:29:38PM -0300, Domingos Soares wrote:
> > Bellow, follows a proposal for a new opensolaris project. Of course,
> > this is open to change since I just wrote down some ideas I had months
> > ago, while researching the topic as a graduate student in Computer
> > Science, and since I'm not an opensolaris/ZFS expert at all. I would
> > really appreciate any suggestion or comments.
> >
> > PROJECT PROPOSAL: ZFS Compression Algorithms.
> >
> > The main purpose of this project is the development of new
> > compression schemes for the ZFS file system. We plan to start with
> > the development of a fast implementation of a Burrows Wheeler
> > Transform based algorithm (BWT). BWT is an outstanding tool
> > and the currently known lossless compression algorithms
> > based on it outperform the compression ratio of algorithms derived from the 
> > well
> > known Ziv-Lempel algorithm, while being a little more time and space
> > expensive. Therefore, there is space for improvement: recent results
> > show that the running time and space needs of such algorithms can be
> > significantly reduced and the same results suggests that BWT is
> > likely to become the new standard in compression
> > algorithms[1]. Suffixes Sorting (i.e. the problem of sorting suffixes of a
> > given string) is the main bottleneck of BWT and really significant
> > progress has been made in this area since the first algorithms of
> > Manbers and Myers[2] and Larsson and Sadakane[3], notably the new
> > linear time algorithms of Karkkainen and Sanders[4]; Kim, Sim and
> > Park[5] and Ko e aluru[6] and also the promising O(nlogn) algorithm of
> > Karkkainen and Burkhardt[7].
> >
> > As a conjecture, we believe that some intrinsic properties of ZFS and
> > file systems in general (e.g. sparseness and data entropy in blocks)
> > could be exploited in order to produce brand new and really efficient
> > compression algorithms, as well as the adaptation of existing ones to
> > the task. The study might be extended to the analysis of data in
> > specific applications (e.g. web servers, mail servers and others) in
> > order to develop compression schemes for specific environments and/or
> > modify the existing Ziv-Lempel based scheme to deal better with such
> > environments.
> >
> > [1] "The Burrows-Wheeler Transform: Theory and Practice". Manzini,
> > Giovanni. Proc. 24th Int. Symposium on Mathematical Foundations of
> > Computer Science
> >
> > [2] "Suffix Arrays: A New Method for
> > On-Line String Searches". Manber, Udi and Myers, Eugene W..  SIAM
> > Journal on Computing, Vol. 22 Issue 5. 1990
> >
> > [3] "Faster suffix sorting". Larsson, N Jasper and Sadakane,
> > Kunihiko. TECHREPORT, Department of Computer Science, Lund University,
> > 1999
> >
> > [4] "Simple Linear Work Suffix Array Construction". Karkkainen, Juha
> > and Sanders,Peter. Proc. 13th International Conference on Automata,
> > Languages and Programming, 2003
> >
> > [5]"Linear-time construction of suffix arrays" D.K. Kim, J.S. Sim,
> > H. Park, K. Park, CPM, LNCS, Vol. 2676, 2003
> >
> > [6]"Space ecient linear time construction of sux arrays",P. Ko and
> > S. Aluru, CPM 2003.
> >
> > [7]"Fast Lightweight Suffix Array Construction and
> > Checking". Burkhardt, Stefan and Kärkkäinen, Juha. 14th Annual
> > Symposium, CPM 2003,
> >
> >
> > Domingos Soares Neto
> > University of Sao Paulo
> > Institute of Mathematics and Statistics
> > _______________________________________________
> > zfs-discuss mailing list
> > zfs-discuss@opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
> --
> Adam Leventhal, Solaris Kernel Development       http://blogs.sun.com/ahl
>
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to