Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)

2011-07-14 Thread Frank Van Damme
Op 12-07-11 13:40, Jim Klimov schreef:
> Even if I batch background RM's so a hundred processes hang
> and then they all at once complete in a minute or two.

Hmmm. I only run one rm process at a time. You think running more
processes at the same time would be faster?

-- 
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)

2011-07-14 Thread Jim Klimov

2011-07-14 11:54, Frank Van Damme пишет:

Op 12-07-11 13:40, Jim Klimov schreef:

Even if I batch background RM's so a hundred processes hang
and then they all at once complete in a minute or two.

Hmmm. I only run one rm process at a time. You think running more
processes at the same time would be faster?

Yes, quite often it seems so.
Whenever my slow "dcpool" decides to accept a write,
it processes a hundred pending deletions instead of one ;)

Even so, it took quite a few pool or iscsi hangs and then
reboots of both server and client, and about a week overall,
to remove a 50Gb dir with 400k small files from a deduped
pool served over iscsi from a volume in a physical pool.

Just completed this night ;)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)

2011-07-14 Thread Frank Van Damme
Op 14-07-11 12:28, Jim Klimov schreef:
>>
> Yes, quite often it seems so.
> Whenever my slow "dcpool" decides to accept a write,
> it processes a hundred pending deletions instead of one ;)
> 
> Even so, it took quite a few pool or iscsi hangs and then
> reboots of both server and client, and about a week overall,
> to remove a 50Gb dir with 400k small files from a deduped
> pool served over iscsi from a volume in a physical pool.
> 
> Just completed this night ;)

It seems counter-intuitive - you'd say: concurrent disk access makes
things only slower - , but it turns out to be true. I'm deleting a dozen
times faster than before. How completely ridiculous.

Thank you :-)

-- 
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)

2011-07-14 Thread Jim Klimov

2011-07-14 15:48, Frank Van Damme пишет:
It seems counter-intuitive - you'd say: concurrent disk access makes 
things only slower - , but it turns out to be true. I'm deleting a 
dozen times faster than before. How completely ridiculous. Thank you :-)


Well, look at it this way: it is not only about singular disk accesses
(i.e. unlike other FSes, you do not in-place modify a directory entry),
with ZFS COW it is about rewriting a tree of block pointers, with any
new writes going into free (unreferenced ATM) disk blocks anyway.

So by hoarding writes you have a chance to reduce mechanical
IOPS required for your tasks. Until you run out of RAM ;)

Just in case it helps, to quickly fire up removals of the specific 
directory

after yet another reboot of the box, and not overwhelm it with hundreds
of thousands queued "rm"processes either, I made this script as /bin/RM:

===
#!/bin/sh

SLEEP=10
[ x"$1" != x ] && SLEEP=$1

A=0
# To rm small files: find ... -size -10
find /export/OLD/PATH/TO/REMOVE -type f | while read LINE; do
  du -hs "$LINE"
  rm -f "$LINE" &
  A=$(($A+1))
  [ "$A" -ge 100 ] && ( date; while [ `ps -ef | grep -wc rm` -gt 50 ]; do
 echo "Sleep $SLEEP..."; ps -ef | grep -wc rm ; sleep $SLEEP; ps 
-ef | grep -wc rm;

  done
  date ) && A="`ps -ef | grep -wc rm`"
done ; date
===

Essentially, after firing up 100 "rm attempts" it waits for the "rm"
process count to go below 50, then goes on. Sizing may vary
between systems, phase of the moon and computer's attitude.
Sometimes I had 700 processes stacked and processed quickly.
Sometimes it hung on 50...

HTH,
//Jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)

2011-07-14 Thread Daniel Carosone
um, this is what xargs -P is for ...

--
Dan.

On Thu, Jul 14, 2011 at 07:24:52PM +0400, Jim Klimov wrote:
> 2011-07-14 15:48, Frank Van Damme ?:
>> It seems counter-intuitive - you'd say: concurrent disk access makes  
>> things only slower - , but it turns out to be true. I'm deleting a  
>> dozen times faster than before. How completely ridiculous. Thank you 
>> :-)
>
> Well, look at it this way: it is not only about singular disk accesses
> (i.e. unlike other FSes, you do not in-place modify a directory entry),
> with ZFS COW it is about rewriting a tree of block pointers, with any
> new writes going into free (unreferenced ATM) disk blocks anyway.
>
> So by hoarding writes you have a chance to reduce mechanical
> IOPS required for your tasks. Until you run out of RAM ;)
>
> Just in case it helps, to quickly fire up removals of the specific  
> directory
> after yet another reboot of the box, and not overwhelm it with hundreds
> of thousands queued "rm"processes either, I made this script as /bin/RM:
>
> ===
> #!/bin/sh
>
> SLEEP=10
> [ x"$1" != x ] && SLEEP=$1
>
> A=0
> # To rm small files: find ... -size -10
> find /export/OLD/PATH/TO/REMOVE -type f | while read LINE; do
>   du -hs "$LINE"
>   rm -f "$LINE" &
>   A=$(($A+1))
>   [ "$A" -ge 100 ] && ( date; while [ `ps -ef | grep -wc rm` -gt 50 ]; do
>  echo "Sleep $SLEEP..."; ps -ef | grep -wc rm ; sleep $SLEEP; ps -ef 
> | grep -wc rm;
>   done
>   date ) && A="`ps -ef | grep -wc rm`"
> done ; date
> ===
>
> Essentially, after firing up 100 "rm attempts" it waits for the "rm"
> process count to go below 50, then goes on. Sizing may vary
> between systems, phase of the moon and computer's attitude.
> Sometimes I had 700 processes stacked and processed quickly.
> Sometimes it hung on 50...
>
> HTH,
> //Jim
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


pgprXDuV2KRuK.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)

2011-07-14 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Edward Ned Harvey
> 
> I understand the argument, DDT must be stored in the primary storage pool
> so
> you can increase the size of the storage pool without running out of space
> to hold the DDT...  But it's a fatal design flaw as long as you care about
> performance...  If you don't care about performance, you might as well use
> the netapp and do offline dedup.  The point of online dedup is to gain
> performance.  So in ZFS you have to care about the performance.
> 
> There are only two possible ways to fix the problem.
> Either ...
> The DDT must be changed so it can be stored entirely in a designated
> sequential area of disk, and maintained entirely in RAM, so all DDT
> reads/writes can be infrequent and serial in nature...  This would solve
the
> case of async writes and large sync writes, but would still perform poorly
> for small sync writes.  And it would be memory intensive.  But it should
> perform very nicely given those limitations.  ;-)
> Or ...
> The DDT stays as it is now, highly scattered small blocks, and there needs
> to be an option to store it entirely on low latency devices such as
> dedicated SSD's.  Eliminate the need for the DDT to reside on the slow
> primary storage pool disks.  I understand you must consider what happens
> when the dedicated SSD gets full.  The obvious choices would be either (a)
> dedup turns off whenever the metadatadevice is full or (b) it defaults to
> writing blocks in the main storage pool.  Maybe that could even be a
> configurable behavior.  Either way, there's a very realistic use case
here.
> For some people in some situations, it may be acceptable to say "I have
32G
> mirrored metadatadevice, divided by 137bytes per entry I can dedup up to a
> maximum 218M unique blocks in pool, and if I estimate 100K average block
> size that means up to 20T primary pool storage.  If I reach that limit,
I'll
> add more metadatadevice."
> 
> Both of those options would also go a long way toward eliminating the
> "surprise" delete performance black hole.

Is anyone from Oracle reading this?  I understand if you can't say what
you're working on and stuff like that.  But I am merely hopeful this work
isn't going into a black hole...  

Anyway.  Thanks for listening (I hope.)   ttyl

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)

2011-07-14 Thread Jim Klimov

2011-07-15 6:21, Daniel Carosone ?:

um, this is what xargs -P is for ...


Thanks for the hint. True, I don't often use xargs.

However from the man pages, I don't see a "-P" option
on OpenSolaris boxes of different releases, and there
is only a "-p" (prompt) mode. I am not eager to enter
"yes" 40 times ;)

The way I had this script in practice, I could enter "RM"
once and it worked till the box hung. Even then, a watchdog
script could often have it rebooted without my interaction
so it could continue in the next lifetime ;)



--
Dan.

On Thu, Jul 14, 2011 at 07:24:52PM +0400, Jim Klimov wrote:

2011-07-14 15:48, Frank Van Damme ?:

It seems counter-intuitive - you'd say: concurrent disk access makes
things only slower - , but it turns out to be true. I'm deleting a
dozen times faster than before. How completely ridiculous. Thank you
:-)

Well, look at it this way: it is not only about singular disk accesses
(i.e. unlike other FSes, you do not in-place modify a directory entry),
with ZFS COW it is about rewriting a tree of block pointers, with any
new writes going into free (unreferenced ATM) disk blocks anyway.

So by hoarding writes you have a chance to reduce mechanical
IOPS required for your tasks. Until you run out of RAM ;)

Just in case it helps, to quickly fire up removals of the specific
directory
after yet another reboot of the box, and not overwhelm it with hundreds
of thousands queued "rm"processes either, I made this script as /bin/RM:

===
#!/bin/sh

SLEEP=10
[ x"$1" != x ]&&  SLEEP=$1

A=0
# To rm small files: find ... -size -10
find /export/OLD/PATH/TO/REMOVE -type f | while read LINE; do
   du -hs "$LINE"
   rm -f "$LINE"&
   A=$(($A+1))
   [ "$A" -ge 100 ]&&  ( date; while [ `ps -ef | grep -wc rm` -gt 50 ]; do
  echo "Sleep $SLEEP..."; ps -ef | grep -wc rm ; sleep $SLEEP; ps -ef
| grep -wc rm;
   done
   date )&&  A="`ps -ef | grep -wc rm`"
done ; date
===

Essentially, after firing up 100 "rm attempts" it waits for the "rm"
process count to go below 50, then goes on. Sizing may vary
between systems, phase of the moon and computer's attitude.
Sometimes I had 700 processes stacked and processed quickly.
Sometimes it hung on 50...

HTH,
//Jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)

2011-07-14 Thread Daniel Carosone
On Fri, Jul 15, 2011 at 07:56:25AM +0400, Jim Klimov wrote:
> 2011-07-15 6:21, Daniel Carosone ?:
>> um, this is what xargs -P is for ...
>
> Thanks for the hint. True, I don't often use xargs.
>
> However from the man pages, I don't see a "-P" option
> on OpenSolaris boxes of different releases, and there
> is only a "-p" (prompt) mode. I am not eager to enter
> "yes" 40 times ;)

you want the /usr/gnu/{bin,share/man} version, at least in this case.

--
Dan.


pgpItiuUybbdI.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)

2011-07-14 Thread Frank Van Damme
Op 15-07-11 04:27, Edward Ned Harvey schreef:
> Is anyone from Oracle reading this?  I understand if you can't say what
> you're working on and stuff like that.  But I am merely hopeful this work
> isn't going into a black hole...  
> 
> Anyway.  Thanks for listening (I hope.)   ttyl

If they aren't, maybe someone from an open source Solaris version is :)

-- 
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss