Re: [zfs-discuss] ZFS fragmentation with MySQL databases

Luke Lonergan Sat, 22 Nov 2008 21:23:39 -0800

> Actually, it does seem to work quite
> well when you use a read optimized
> SSD for the L2ARC.  In that case,
> "random" read workloads have very
> fast access, once the cache is warm.


One would expect so, yes.  But the usefulness of this is limited to the cases 
where the entire working set will fit into an SSD cache.

In other words, for random access across a working set larger (by say X%) than 
the SSD-backed L2 ARC, the cache is useless.  This should asymptotically 
approach truth as X grows and experience shows that X=200% is where it's about 
99% true.

As time passes and SSDs get larger while many OLTP random workloads remain 
somewhat constrained in size, this becomes less important.

Modern DB workloads are becoming hybridized, though.  A 'mixed workload' 
scenario is now common where there are a mix of updated working sets and 
indexed access alongside heavy analytical 'update rarely if ever' kind of 
workloads.

- Luke

----- Original Message -----
From: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
To: Luke Lonergan
Cc: [EMAIL PROTECTED] <[EMAIL PROTECTED]>; zfs-discuss@opensolaris.org 
<zfs-discuss@opensolaris.org>
Sent: Sat Nov 22 20:28:54 2008
Subject: Re: [zfs-discuss] ZFS fragmentation with MySQL databases

Luke Lonergan wrote:
> ZFS works marvelously well for data warehouse and analytic DBs.  For lots of 
> small updates scattered across the breadth of the persistent working set, 
> it's not going to work well IMO.
>

Actually, it does seem to work quite well when you use a read optimized
SSD for the L2ARC.  In that case, "random" read workloads have very
fast access, once the cache is warm.
 -- richard

> Note that we're using ZFS to host databases as large as 10,000 TB - that's 
> 10PB (!!).  Solaris 10 U5 on X4540.  That said - it's on 96 servers running 
> Greenplum DB.
>
> With SSD, the randomness won't matter much I expect, though the filesystem 
> won't be helping by virtue of this fragmentation effect of COW.
>
> - Luke
>
> ----- Original Message -----
> From: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
> To: zfs-discuss@opensolaris.org <zfs-discuss@opensolaris.org>
> Sent: Sat Nov 22 16:43:53 2008
> Subject: Re: [zfs-discuss] ZFS fragmentation with MySQL databases
>
> Kees Nuyt wrote:
>
>> My explanation would be: Whenever a block within a file
>> changes, zfs has to write it at another location ("copy on
>> write"), so the previous version isn't immediately lost.
>>
>> Zfs will try to keep the new version of the block close to
>> the original one, but after several changes on the same
>> database page, things get pretty messed up and logical
>> sequential I/O becomes pretty much physically random indeed.
>>
>> The original blocks will eventually be added to the freelist
>> and reused, so proximity can be restored, but it will never
>> be 100% sequential again.
>> The effect is larger when many snapshots are kept, because
>> older block versions are not freed, or when the same block
>> is changed very often and freelist updating has to be
>> postponed.
>>
>> That is the trade-off between "always consistent" and
>> "fast".
>>
>>
> Well, does that mean ZFS is not best suited for database engines as
> underlying
> filesystem?  With databases it will always be fragmented, hence slow
> performance?
>
> Because this way it would be best to use it for large file server that
> don't usually change frequently.
>
> Thanks,
> Tamer
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS fragmentation with MySQL databases

Reply via email to