Re: [zfs-discuss] ZFS Compression algorithms - Project Proposal

Nicolas Williams Mon, 09 Jul 2007 16:08:11 -0700

On Mon, Jul 09, 2007 at 05:27:44PM -0500, Haudy Kazemi wrote:
> Wouldn't ZFS's being an integrated filesystem make it easier for it to 
> identify the file types vs. a standard block device with a filesystem 
> overlaid upon it?


How?  The API to ZFS that most everything uses is the POSIX API.
There's no MIME type argument to creat(2).

> I read in another post that with compression enabled, ZFS attempts to 
> compress the data and stores it compressed if it compresses enough. As far 
> as identifying the file type/data type how about:
> 1.) ZFS block compression system reads the ZFS file table to identify which 
> blocks are the beginning of files (or for new writes, the block compression 
> system is notified that file.ext is being written on block #### (e.g. block 
> 9,000,201).

You might as well cache this in the dnode to keep the number of reads
needed to re-discover this info to a minimum.

> 2.) ZFS block compression system reads block ####, identifies the file type 
> probably based on the file header and applies the most appropriate 
> compression format, or if none found, the default

Like an in-kernel file(1).  That could work, but that seems like a lot
of work.

You might as well try all compression algorithms available, pick the
best one the first time that one of them works for any given file block
being written, and record this in the file's dnode so that for
subsequent writes only that compression algorithm is used.  (Of course,
the compression algorithm selection process could be more dynamic than
that.)

> An approach for maximal compression:
> The algorithm selection could be
> 1.) attempt to compress using BWT, store compressed if BWT works better 
> than no compression
> 2.) when CPU is otherwise idle, use 10% of spare cpu cycles to "walk the 
> disk", trying to recompress each block with each of the various supported 
> compression algorithms, ultimately storing that block in the most space 
> efficient compression format.

Sure, that's a lazy (in the sense of doing the work in the background
when there's enough cycles) approach to compression *and* compression
algorithm selection.

> This technique would result in a file system that tends to compact its data 
> ever more tightly as the data sits in it. It could be compared to 
> 'settling' flakes in a cereal box...the contents may have had a lot of 'air 
> space' before shipment, but are now 'compressed'. The recompression step 
> might even be part of a period disk scrubbing step meant to check and 
> recheck previously written data to make sure the sector it is sitting on 
> isn't going bad.

Re-compressing (re-encrypting, etc...) as part of a "filesystem scrub"
(as opposed to volume scrub) is one of the reasons for wanting a
"filesystem scrub" feature.

Nico
-- 
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS Compression algorithms - Project Proposal

Reply via email to