Re: [zfs-discuss] Improving L1ARC cache efficiency with dedup

Nico Williams Thu, 29 Dec 2011 09:42:13 -0800

On Thu, Dec 29, 2011 at 9:53 AM, Brad Diggs <brad.di...@oracle.com> wrote:
> Jim,
>
> You are spot on.  I was hoping that the writes would be close enough to 
> identical that
> there would be a high ratio of duplicate data since I use the same record 
> size, page size,
> compression algorithm, … etc.  However, that was not the case.  The main 
> thing that I
> wanted to prove though was that if the data was the same the L1 ARC only 
> caches the
> data that was actually written to storage.  That is a really cool thing!  I 
> am sure there will
> be future study on this topic as it applies to other scenarios.
>
> With regards to directory engineering investing any energy into optimizing 
> ODSEE DS
> to more effectively leverage this caching potential, that won't happen.  OUD 
> far out
> performs ODSEE.  That said OUD may get some focus in this area.  However, 
> time will
> tell on that one.


Databases are not as likely to benefit from dedup as virtual machines,
indeed, DBs are likely to not benefit at all from dedup.  The VM use
case benefits from dedup for the obvious reason that many VMs will
have the same exact software installed most of the time, using the
same filesystems, and the same patch/update installation order, so if
you keep data out of their root filesystems then you can expect
enormous deduplicatiousness.  But databases, not so much.  The unit of
deduplicable data in a VM use case is the guest's preferred block
size, while in a DB the unit of deduplicable data might be a
variable-sized table row, or even smaller: a single row/column value
-- and you have no way to ensure alignment of individual deduplicable
units nor ordering of sets of deduplicable units into larger ones.

When it comes to databases your best bets will be: a) database-level
compression or dedup features (e.g., Oracle's column-level compression
feature) or b) ZFS compression.

(Dedup makes VM management easier, because the alternative is to patch
one master guest VM [per-guest type] then re-clone and re-configure
all instances of that guest type, in the process possibly losing any
customizations in those guests.  But even before dedup, the ability to
snapshot and clone datasets was an impressive dedup-like tool for the
VM use-case, just not as convenient as dedup.)

Nico
--
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Improving L1ARC cache efficiency with dedup

Reply via email to