Re[12]: [zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

Robert Milkowski Tue, 20 Feb 2007 13:12:23 -0800

Hello eric,

Tuesday, February 20, 2007, 5:55:47 PM, you wrote:

ek> On Feb 15, 2007, at 6:08 AM, Robert Milkowski wrote:

>> Hello eric,
>>
>> Wednesday, February 14, 2007, 5:04:01 PM, you wrote:
>>
>> ek> I'm wondering if we can just lower the amount of space we're  
>> trying
>> ek> to alloc as the pool becomes more fragmented - we'll lose a  
>> little I/
>> ek> O performance, but it should limit this bug.
>>
>> Do you think that zfs send|recv for such file systems could help for a
>> while? (would it defragment data?)
>>
>>

ek> If you were able to send over your complete pool, destroy the  
ek> existing one and re-create a new one using recv, then that should  
ek> help with fragmentation.  That said, that's a very poor man's  
ek> defragger.  The defragmentation should happen automatically or at  
ek> least while the pool is online.

I was rather thinking about sending all file systems to another
server/pool (I'm in a middle of the process), then deleting source
file systems and send the file systems back. Of course no problem with
destroying the pool but I wonder why do you think it's needed? Just
deleting file systems won't be enough?

btw: I've already migrated three file systems that way to x4500 and
     so far they are working great - no cpu usage, much less read IOs
     (both in # of IO and in volume) and everything is committed
     exactly every 5s. So I guess there's high degree of probability
     it will stay the same once I migrate them back to cluster.

ek> In the absence of a built-in defragger and without a fix for 6495013,
ek> i think the best thing you could do is either add more storage or  
ek> remove some data (such as removing some old snapshots or move some  
ek> unneeded storage to another system/backup).  Not sure if either of  
ek> those are applicable to you.

Some time ago removing snapshots helped. Then we stopped creating them
and not there's nothing really to remove left so I'm doing above.

btw: 6495013 - what is it exactly (bugs.opensolaris.org doesn't show
it either sunsolve).

ek> Should note, ZFS isn't the only COW filesystem that hits this  
ek> problem.  In WAFL's case, i believe they hit this same problem around
ek> 90% capacity but are very nice about detecting it and reporting back  
ek> to the admin.

Yeah, but with ZFS I hit really big problem when pool has still more
than 50% free space (which is almost 700GB of free space).
And it's not only that writes are slow it's also that during the
problem one whole core is eaten up to 100% all the time. Which rises a
suspicion that perhaps something could go here in parallel and speed
up thing a little bit on T2000 or v440. Crude workaround but still
most of other cores just do nothing and one core is 100% in sys doing
all those avl_walks (long journey).

-- 
Best regards,
 Robert                            mailto:[EMAIL PROTECTED]
                                       http://milek.blogspot.com

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re[12]: [zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

Reply via email to