Luke Lonergan wrote:
> ZFS works marvelously well for data warehouse and analytic DBs.  For lots of 
> small updates scattered across the breadth of the persistent working set, 
> it's not going to work well IMO.
>   

Actually, it does seem to work quite well when you use a read optimized
SSD for the L2ARC.  In that case, "random" read workloads have very
fast access, once the cache is warm.
 -- richard

> Note that we're using ZFS to host databases as large as 10,000 TB - that's 
> 10PB (!!).  Solaris 10 U5 on X4540.  That said - it's on 96 servers running 
> Greenplum DB.
>
> With SSD, the randomness won't matter much I expect, though the filesystem 
> won't be helping by virtue of this fragmentation effect of COW.
>
> - Luke
>
> ----- Original Message -----
> From: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
> To: zfs-discuss@opensolaris.org <zfs-discuss@opensolaris.org>
> Sent: Sat Nov 22 16:43:53 2008
> Subject: Re: [zfs-discuss] ZFS fragmentation with MySQL databases
>
> Kees Nuyt wrote:
>   
>> My explanation would be: Whenever a block within a file
>> changes, zfs has to write it at another location ("copy on
>> write"), so the previous version isn't immediately lost.
>>
>> Zfs will try to keep the new version of the block close to
>> the original one, but after several changes on the same
>> database page, things get pretty messed up and logical
>> sequential I/O becomes pretty much physically random indeed.
>>
>> The original blocks will eventually be added to the freelist
>> and reused, so proximity can be restored, but it will never
>> be 100% sequential again.
>> The effect is larger when many snapshots are kept, because
>> older block versions are not freed, or when the same block
>> is changed very often and freelist updating has to be
>> postponed.
>>
>> That is the trade-off between "always consistent" and
>> "fast".
>>
>>     
> Well, does that mean ZFS is not best suited for database engines as
> underlying
> filesystem?  With databases it will always be fragmented, hence slow
> performance?
>
> Because this way it would be best to use it for large file server that
> don't usually change frequently.
>
> Thanks,
> Tamer
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to