Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Gaëtan Lehmann


Hi,

Here is the result on a Dell Precision T5500 with 24 GB of RAM and two  
HD in a mirror (SATA, 7200 rpm, NCQ).


[glehm...@marvin2 tmp]$ uname -a
SunOS marvin2 5.11 snv_117 i86pc i386 i86pc Solaris
[glehm...@marvin2 tmp]$ pfexec ./zfs-cache-test.ksh
zfs create rpool/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under /rpool/ 
zfscachetest ...

Done!
zfs unmount rpool/zfscachetest
zfs mount rpool/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
48000247 blocks

real8m19,74s
user0m6,47s
sys 0m25,32s

Doing second 'cpio -o > /dev/null'
48000247 blocks

real10m42,68s
user0m8,35s
sys 0m30,93s

Feel free to clean up with 'zfs destroy rpool/zfscachetest'.

HTH,

Gaëtan



Le 13 juil. 09 à 01:15, Scott Lawson a écrit :


Bob,

Output of my run for you. System is a M3000 with 16 GB RAM and 1  
zpool called test1
which is contained on a raid 1 volume on a 6140 with 7.50.13.10  
firmware on

the RAID controllers. RAid 1 is made up of two 146GB 15K FC disks.

This machine is brand new with a clean install of S10 05/09. It is  
destined to become a Oracle 10 server with

ZFS filesystems for zones and DB volumes.

[r...@xxx /]#> uname -a
SunOS xxx 5.10 Generic_139555-08 sun4u sparc SUNW,SPARC-Enterprise
[r...@xxx /]#> cat /etc/release
 Solaris 10 5/09 s10s_u7wos_08 SPARC
 Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
  Use is subject to license terms.
   Assembled 30 March 2009

[r...@xxx /]#> prtdiag -v | more
System Configuration:  Sun Microsystems  sun4u Sun SPARC Enterprise  
M3000 Server

System clock frequency: 1064 MHz
Memory size: 16384 Megabytes


Here is the run output for you.

[r...@xxx tmp]#> ./zfs-cache-test.ksh test1
zfs create test1/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under /test1/ 
zfscachetest ...

Done!
zfs unmount test1/zfscachetest
zfs mount test1/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
48000247 blocks

real4m48.94s
user0m21.58s
sys 0m44.91s

Doing second 'cpio -o > /dev/null'
48000247 blocks

real6m39.87s
user0m21.62s
sys 0m46.20s

Feel free to clean up with 'zfs destroy test1/zfscachetest'.

Looks like a 25% performance loss for me. I was seeing around 80MB/s  
sustained

on the first run and around 60M/'s sustained on the 2nd.

/Scott.


Bob Friesenhahn wrote:
There has been no forward progress on the ZFS read performance  
issue for a week now.  A 4X reduction in file read performance due  
to having read the file before is terrible, and of course the  
situation is considerably worse if the file was previously mmapped  
as well.  Many of us have sent a lot of money to Sun and were not  
aware that ZFS is sucking the life out of our expensive Sun hardware.


It is trivially easy to reproduce this problem on multiple  
machines. For example, I reproduced it on my Blade 2500 (SPARC)  
which uses a simple mirrored rpool.  On that system there is a 1.8X  
read slowdown from the file being accessed previously.


In order to raise visibility of this issue, I invite others to see  
if they can reproduce it in their ZFS pools.  The script at


http://www.simplesystems.org/users/bfriesen/zfs-discuss/zfs-cache-test.ksh

Implements a simple test.  It requires a fair amount of disk space  
to run, but the main requirement is that the disk space consumed be  
more than available memory so that file data gets purged from the  
ARC. The script needs to run as root since it creates a filesystem  
and uses mount/umount.  The script does not destroy any data.


There are several adjustments which may be made at the front of the  
script.  The pool 'rpool' is used by default, but the name of the  
pool to test may be supplied via an argument similar to:


# ./zfs-cache-test.ksh Sun_2540
zfs create Sun_2540/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under / 
Sun_2540/zfscachetest ...

Done!
zfs unmount Sun_2540/zfscachetest
zfs mount Sun_2540/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
48000247 blocks

real2m54.17s
user0m7.65s
sys 0m36.59s

Doing second 'cpio -o > /dev/null'
48000247 blocks

real11m54.65s
user0m7.70s
sys 0m35.06s

Feel free to clean up with 'zfs destroy Sun_2540/zfscachetest'.

And here is a similar run on my Blade 2500 using the default rpool:

# ./zfs-cache-test.ksh
zfs create rpool/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under /rpool/ 
zfscachetest ...

Done!
zfs unmount rpool/zfscachetest
zfs mount rpool/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
48000247 blocks

real13m3.91s
user2m43.04s
sys 9m28.73s

Doing second 'cpio -o > /dev/null'
48000247 blocks

real23m50.27s
user2m41.81s
sys 9m46.76s

Feel free to clean up with 'zfs destroy rpool/zfscachetest'.

I am interested to hear about systems which do not suffer from this  
bu

Re: [zfs-discuss] deduplication

2009-07-13 Thread Cyril Plisko
Richard,

> Also, we now know the market value for dedupe intellectual property: $2.1
> Billion.
> Even though there may be open source, that does not mean there are not IP
> barriers.  $2.1 Billion attracts a lot of lawyers :-(

Indeed, good point.

-- 
Regards,
Cyril
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Alexander Skwar
Bob,

On Sun, Jul 12, 2009 at 23:38, Bob
Friesenhahn wrote:
> There has been no forward progress on the ZFS read performance issue for a
> week now.  A 4X reduction in file read performance due to having read the
> file before is terrible, and of course the situation is considerably worse
> if the file was previously mmapped as well.  Many of us have sent a lot of
> money to Sun and were not aware that ZFS is sucking the life out of our
> expensive Sun hardware.
>
> It is trivially easy to reproduce this problem on multiple machines. For
> example, I reproduced it on my Blade 2500 (SPARC) which uses a simple
> mirrored rpool.  On that system there is a 1.8X read slowdown from the file
> being accessed previously.
>
> In order to raise visibility of this issue, I invite others to see if they
> can reproduce it in their ZFS pools.  The script at
>
> http://www.simplesystems.org/users/bfriesen/zfs-discuss/zfs-cache-test.ksh
>
> Implements a simple test.

--($ ~)-- time sudo ksh zfs-cache-test.ksh
zfs create rpool/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under
/rpool/zfscachetest ...
Done!
zfs unmount rpool/zfscachetest
zfs mount rpool/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
48000247 Blöcke

real4m7.70s
user0m24.10s
sys 1m5.99s

Doing second 'cpio -o > /dev/null'
48000247 Blöcke

real1m44.88s
user0m22.26s
sys 0m51.56s

Feel free to clean up with 'zfs destroy rpool/zfscachetest'.

real10m47.747s
user0m54.189s
sys 3m22.039s

This is a M4000 mit 32 GB RAM and two HDs in a mirror.

Alexander
-- 
[[ http://zensursula.net ]]
[ Soc. => http://twitter.com/alexs77 | http://www.plurk.com/alexs77 ]
[ Mehr => http://zyb.com/alexws77 ]
[ Chat => Jabber: alexw...@jabber80.com | Google Talk: a.sk...@gmail.com ]
[ Mehr => AIM: alexws77 ]
[ $[ $RANDOM % 6 ] = 0 ] && rm -rf / || echo 'CLICK!'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Ross
Hey Bob,

Here are my results on a Dual 2.2Ghz Opteron, 8GB of RAM and 16 SATA disks 
connected via a Supermicro AOC-SAT2-MV8 (albeit with one dead drive).

Looks like a 5x slowdown to me:

Doing initial (unmount/mount) 'cpio -o > /dev/null'
48000247 blocks

real4m46.45s
user0m10.29s
sys 0m58.27s

Doing second 'cpio -o > /dev/null'
48000247 blocks

real15m50.62s
user0m10.54s
sys 1m11.86s

Ross
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Daniel Rock

Hi,


Solaris 10U7, patched to the latest released patches two weeks ago.

Four ST31000340NS attached to two SI3132 SATA controller, RAIDZ1.

Selfmade system with 2GB RAM and an
  x86 (chipid 0x0 AuthenticAMD family 15 model 35 step 2 clock 2210 MHz)
AMD Athlon(tm) 64 X2 Dual Core Processor 4400+
processor.


On the first run throughput was ~110MB/s, on the second run only 80MB/s.

Doing initial (unmount/mount) 'cpio -o > /dev/null'
48000247 Blöcke

real3m37.17s
user0m11.15s
sys 0m47.74s

Doing second 'cpio -o > /dev/null'
48000247 Blöcke

real4m55.69s
user0m10.69s
sys 0m47.57s




Daniel
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can't offline a RAID-Z2 device: "no valid replica"

2009-07-13 Thread Ross
Yup, just hit exactly the same myself.  I have a feeling this faulted disk is 
affecting performance, so tried to remove or offline it:

$ zpool iostat -v 30

   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
rc-pool 1.27T  1015G682 71  84.0M  1.88M
  mirror 199G   265G  0  5  0  21.1K
c4t1d0  -  -  0  2  0  21.1K
c4t2d0  -  -  0  0  0  0
c5t1d0  -  -  0  2  0  21.1K
  mirror 277G   187G170  7  21.1M   322K
c4t3d0  -  - 58  4  7.31M   322K
c5t2d0  -  - 54  4  6.83M   322K
c5t0d0  -  - 56  4  6.99M   322K
  mirror 276G   188G171  6  21.1M   336K
c5t3d0  -  - 56  4  7.03M   336K
c4t5d0  -  - 56  3  7.03M   336K
c4t4d0  -  - 56  3  7.04M   336K
  mirror 276G   188G169  6  20.9M   353K
c5t4d0  -  - 57  3  7.17M   353K
c5t5d0  -  - 54  4  6.79M   353K
c4t6d0  -  - 55  3  6.99M   353K
  mirror 277G   187G171 10  20.9M   271K
c4t7d0  -  - 56  4  7.11M   271K
c5t6d0  -  - 55  5  6.93M   271K
c5t7d0  -  - 55  5  6.88M   271K
  c6d1p0  32K   504M  0 34  0   620K
--  -  -  -  -  -  -

20MB in 30 seconds for 3 disks that's 220kb/s.  Not healthy at all.

$ zpool status
  pool: rc-pool
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
   see: http://www.sun.com/msg/ZFS-8000-K4
 scrub: scrub completed after 2h55m with 0 errors on Tue Jun 23 11:11:42 2009
config:

NAMESTATE READ WRITE CKSUM
rc-pool DEGRADED 0 0 0
  mirrorDEGRADED 0 0 0
c4t1d0  ONLINE   0 0 0
c4t2d0  FAULTED  1.71M 23.3M 0  too many errors
c5t1d0  ONLINE   0 0 0
  mirrorONLINE   0 0 0
c4t3d0  ONLINE   0 0 0
c5t2d0  ONLINE   0 0 0
c5t0d0  ONLINE   0 0 0
  mirrorONLINE   0 0 0
c5t3d0  ONLINE   0 0 0
c4t5d0  ONLINE   0 0 0
c4t4d0  ONLINE   0 0 0
  mirrorONLINE   0 0 0
c5t4d0  ONLINE   0 0 0
c5t5d0  ONLINE   0 0 0
c4t6d0  ONLINE   0 0 0
  mirrorONLINE   0 0 0
c4t7d0  ONLINE   0 0 0
c5t6d0  ONLINE   0 0 0
c5t7d0  ONLINE   0 0 0
logsDEGRADED 0 0 0
  c6d1p0ONLINE   0 0 0

errors: No known data errors


# zpool offline rc-pool c4t2d0
cannot offline c4t2d0: no valid replicas

# zpool remove rc-pool c4t2d0
cannot remove c4t2d0: only inactive hot spares or cache devices can be removed
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Jorgen Lundman


x4540 running svn117

# ./zfs-cache-test.ksh zpool1
zfs create zpool1/zfscachetest
creating data file set 93000 files of 8192000 bytes0 under 
/zpool1/zfscachetest ...

done1
zfs unmount zpool1/zfscachetest
zfs mount zpool1/zfscachetest

doing initial (unmount/mount) 'cpio -o . /dev/null'
48000247 blocks

real4m7.13s
user0m9.27s
sys 0m49.09s

doing second 'cpio -o . /dev/null'
48000247 blocks

real4m52.52s
user0m9.13s
sys 0m47.51s








___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] OpenSolaris 2008.11 - resilver still restarting

2009-07-13 Thread Ross
Just look at this.  I thought all the restarting resilver bugs were fixed, but 
it looks like something odd is still happening at the start:

Status immediately after starting resilver:

# zpool status
  pool: rc-pool
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver in progress for 0h0m, 0.00% done, 57h3m to go
config:

NAME  STATE READ WRITE CKSUM
rc-pool   DEGRADED 0 0 0
  mirror  DEGRADED 0 0 0
c4t1d0ONLINE   0 0 0  5.56M resilvered
replacing DEGRADED 0 0 0
  c4t2d0s0/o  FAULTED  1.71M 23.3M 0  too many errors
  c4t2d0  ONLINE   0 0 0  5.43M resilvered
c5t1d0ONLINE   0 0 0  5.55M resilvered
 

And a few minutes later:

# zpool status
  pool: rc-pool
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver in progress for 0h0m, 0.00% done, 245h21m to go
config:

NAME  STATE READ WRITE CKSUM
rc-pool   DEGRADED 0 0 0
  mirror  DEGRADED 0 0 0
c4t1d0ONLINE   0 0 0  1.10M resilvered
replacing DEGRADED 0 0 0
  c4t2d0s0/o  FAULTED  1.71M 23.3M 0  too many errors
  c4t2d0  ONLINE   0 0 0  824K resilvered
c5t1d0ONLINE   0 0 0  1.10M resilvered


It's gone from 5MB resilvered to 1MB, and increased the estimated time to 245 
hours.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] first use send/receive... somewhat confused.

2009-07-13 Thread Harry Putnam
Richard Elling  writes:

> You can only send/receive snapshots.  However, on the receiving end,
> there will also be a dataset of the name you choose.  Since you didn't
> share what commands you used, it is pretty impossible for us to
> speculate what you might have tried.

I thought I made it clear I had not used any commands but gave two
detailed examples of different ways to attempt the move.

I see now the main thing that confused me is that sending a 
z1/proje...@something
to a new z2/proje...@something would also result in z2/projects being
created.

That part was not at all clear to me from the man page.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] first use send/receive... somewhat confused.

2009-07-13 Thread Harry Putnam
Ross Walker  writes:

> Once the data is copied you can delete the snapshots that will then
> exist on both pools.

That's the part right there that wasn't apparent.

That
  zfs send z1/someth...@snap  |zfs receive z2/someth...@snap

Would also create z2/something

> If you have mount options set use the -u option on the recv to have it
> defer attempting mounting the conflicting datasets.

That little item right there... has probably saved me some serious
headaches since in my scheme... the resulting newpool/fs would have
the same mount point. ... thanks

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Alexander Skwar
Here's a more useful output, with having set the number of
files to 6000, so that it has a dataset which is larger than the
amount of RAM.

--($ ~)-- time sudo ksh zfs-cache-test.ksh
zfs create rpool/zfscachetest
Creating data file set (6000 files of 8192000 bytes) under
/rpool/zfscachetest ...
Done!
zfs unmount rpool/zfscachetest
zfs mount rpool/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
96000493 Blöcke

real8m44.82s
user0m46.85s
sys2m15.01s

Doing second 'cpio -o > /dev/null'
96000493 Blöcke

real29m15.81s
user0m45.31s
sys3m2.36s

Feel free to clean up with 'zfs destroy rpool/zfscachetest'.

real48m40.890s
user1m47.192s
sys8m2.165s

Still on S10 U7 Sparc M4000.

So I'm now inline with the other results - the 2nd run is WAY slower. 4x
as slow.

Alexander
-- 
[[ http://zensursula.net ]]
[ Soc. => http://twitter.com/alexs77 | http://www.plurk.com/alexs77 ]
[ Mehr => http://zyb.com/alexws77 ]
[ Chat => Jabber: alexw...@jabber80.com | Google Talk: a.sk...@gmail.com ]
[ Mehr => AIM: alexws77 ]
[ $[ $RANDOM % 6 ] = 0 ] && rm -rf / || echo 'CLICK!'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] first use send/receive... somewhat confused.

2009-07-13 Thread Dennis Clarke

> Richard Elling  writes:
>
>> You can only send/receive snapshots.  However, on the receiving end,
>> there will also be a dataset of the name you choose.  Since you didn't
>> share what commands you used, it is pretty impossible for us to
>> speculate what you might have tried.
>
> I thought I made it clear I had not used any commands but gave two
> detailed examples of different ways to attempt the move.
>
> I see now the main thing that confused me is that sending a
> z1/proje...@something
> to a new z2/proje...@something would also result in z2/projects being
> created.
>
> That part was not at all clear to me from the man page.

This will probably get me bombed with napalm but I often just
use star from Jörg Schilling because its dead easy :

  star -copy -p -acl -sparse -dump -C old_dir . new_dir

and you're done.[1]

So long as you have both the new and the old zfs/ufs/whatever[2]
filesystems mounted. It doesn't matter if they are static or not. If
anything changes on the filesystem then star will tell you about it.

-- 
Dennis

[1] -p means preserve meta-properties of the files/dirs etc.
-acl means what it says. Grabs ACL data also.
-sparse means what it says. Handles files with holes in them.
-dump means be super careful about everything ( read the manpage )

[2] star doesn't care if its zfs or ufs or a CDROM or a floppy.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hangs the entire server (please help; data included)

2009-07-13 Thread Jim Leonard
Thanks to the help of a zfs/kernel developer at Sun who volunteered to help me, 
it turns out this was a bug in solaris that needs to be fixed.  Bug report here 
for the curious:  
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6859446
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris 2008.11 - resilver still restarting

2009-07-13 Thread Galen

Ross,

I feel you here, but I don't have much of a solution.

The best I can suggest (and has been my solution) is to take out the  
problematic disk, copy it to a fresh disk (preferably using something  
like dd_rescue) and then re-install.


It seems the resilvering loop is generally a result of a faulty  
device, but even if it is taken offline, you still have issues. I have  
had so many zpool resilvering loops, it's not funny. I'm running  
2009.06 with all updates applied. I've had a very, very bad batch of  
disks.


I actually have a resilvering loop running right now, and I need to go  
copy off the offending device. Again.


I wish I had a better solution, because the zpool functions fine, no  
data errors, but resilvering loops forever. I love ZFS as an on-disk  
format. I increasingly hate the implementation of ZFS software.


-Galen

On Jul 13, 2009, at 5:34 AM, Ross wrote:

Just look at this.  I thought all the restarting resilver bugs were  
fixed, but it looks like something odd is still happening at the  
start:


Status immediately after starting resilver:

# zpool status
 pool: rc-pool
state: DEGRADED
status: One or more devices has experienced an unrecoverable error.   
An
   attempt was made to correct the error.  Applications are  
unaffected.
action: Determine if the device needs to be replaced, and clear the  
errors

   using 'zpool clear' or replace the device with 'zpool replace'.
  see: http://www.sun.com/msg/ZFS-8000-9P
scrub: resilver in progress for 0h0m, 0.00% done, 57h3m to go
config:

   NAME  STATE READ WRITE CKSUM
   rc-pool   DEGRADED 0 0 0
 mirror  DEGRADED 0 0 0
   c4t1d0ONLINE   0 0 0  5.56M resilvered
   replacing DEGRADED 0 0 0
 c4t2d0s0/o  FAULTED  1.71M 23.3M 0  too many errors
 c4t2d0  ONLINE   0 0 0  5.43M resilvered
   c5t1d0ONLINE   0 0 0  5.55M resilvered


And a few minutes later:

# zpool status
 pool: rc-pool
state: DEGRADED
status: One or more devices has experienced an unrecoverable error.   
An
   attempt was made to correct the error.  Applications are  
unaffected.
action: Determine if the device needs to be replaced, and clear the  
errors

   using 'zpool clear' or replace the device with 'zpool replace'.
  see: http://www.sun.com/msg/ZFS-8000-9P
scrub: resilver in progress for 0h0m, 0.00% done, 245h21m to go
config:

   NAME  STATE READ WRITE CKSUM
   rc-pool   DEGRADED 0 0 0
 mirror  DEGRADED 0 0 0
   c4t1d0ONLINE   0 0 0  1.10M resilvered
   replacing DEGRADED 0 0 0
 c4t2d0s0/o  FAULTED  1.71M 23.3M 0  too many errors
 c4t2d0  ONLINE   0 0 0  824K resilvered
   c5t1d0ONLINE   0 0 0  1.10M resilvered


It's gone from 5MB resilvered to 1MB, and increased the estimated  
time to 245 hours.

--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris 2008.11 - resilver still restarting

2009-07-13 Thread Ross Walker


Maybe it's the disks firmware that is bad or maybe they're jumpered  
for 1.5Gbps on a 3.0 only bus? Or maybe it's a problem with the disk  
cable/bay/enclosure/slot?


It sounds like there is more then ZFS in the mix here. I wonder if the  
drive's status keeps flapping online/offline and either ZFS or FMA are  
too lax in marking a drive offline after recurring timeouts.


Take a look at your disk enclosure and iostat -En for the number of  
timeouts happenning.


-Ross


On Jul 13, 2009, at 9:05 AM, Galen  wrote:


Ross,

I feel you here, but I don't have much of a solution.

The best I can suggest (and has been my solution) is to take out the  
problematic disk, copy it to a fresh disk (preferably using  
something like dd_rescue) and then re-install.


It seems the resilvering loop is generally a result of a faulty  
device, but even if it is taken offline, you still have issues. I  
have had so many zpool resilvering loops, it's not funny. I'm  
running 2009.06 with all updates applied. I've had a very, very bad  
batch of disks.


I actually have a resilvering loop running right now, and I need to  
go copy off the offending device. Again.


I wish I had a better solution, because the zpool functions fine, no  
data errors, but resilvering loops forever. I love ZFS as an on-disk  
format. I increasingly hate the implementation of ZFS software.


-Galen

On Jul 13, 2009, at 5:34 AM, Ross wrote:

Just look at this.  I thought all the restarting resilver bugs were  
fixed, but it looks like something odd is still happening at the  
start:


Status immediately after starting resilver:

# zpool status
pool: rc-pool
state: DEGRADED
status: One or more devices has experienced an unrecoverable  
error.  An
  attempt was made to correct the error.  Applications are  
unaffected.
action: Determine if the device needs to be replaced, and clear the  
errors

  using 'zpool clear' or replace the device with 'zpool replace'.
 see: http://www.sun.com/msg/ZFS-8000-9P
scrub: resilver in progress for 0h0m, 0.00% done, 57h3m to go
config:

  NAME  STATE READ WRITE CKSUM
  rc-pool   DEGRADED 0 0 0
mirror  DEGRADED 0 0 0
  c4t1d0ONLINE   0 0 0  5.56M resilvered
  replacing DEGRADED 0 0 0
c4t2d0s0/o  FAULTED  1.71M 23.3M 0  too many errors
c4t2d0  ONLINE   0 0 0  5.43M resilvered
  c5t1d0ONLINE   0 0 0  5.55M resilvered


And a few minutes later:

# zpool status
pool: rc-pool
state: DEGRADED
status: One or more devices has experienced an unrecoverable  
error.  An
  attempt was made to correct the error.  Applications are  
unaffected.
action: Determine if the device needs to be replaced, and clear the  
errors

  using 'zpool clear' or replace the device with 'zpool replace'.
 see: http://www.sun.com/msg/ZFS-8000-9P
scrub: resilver in progress for 0h0m, 0.00% done, 245h21m to go
config:

  NAME  STATE READ WRITE CKSUM
  rc-pool   DEGRADED 0 0 0
mirror  DEGRADED 0 0 0
  c4t1d0ONLINE   0 0 0  1.10M resilvered
  replacing DEGRADED 0 0 0
c4t2d0s0/o  FAULTED  1.71M 23.3M 0  too many errors
c4t2d0  ONLINE   0 0 0  824K resilvered
  c5t1d0ONLINE   0 0 0  1.10M resilvered


It's gone from 5MB resilvered to 1MB, and increased the estimated  
time to 245 hours.

--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Bob Friesenhahn

On Mon, 13 Jul 2009, Alexander Skwar wrote:


This is a M4000 mit 32 GB RAM and two HDs in a mirror.


I think that you should edit the script to increase the file count 
since your RAM size is big enough to cache most of the data.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] first use send/receive... somewhat confused.

2009-07-13 Thread Harry Putnam
Dennis Clarke  writes:

> This will probably get me bombed with napalm but I often just
> use star from Jörg Schilling because its dead easy :
>
>   star -copy -p -acl -sparse -dump -C old_dir . new_dir
>
> and you're done.[1]
>
> So long as you have both the new and the old zfs/ufs/whatever[2]
> filesystems mounted. It doesn't matter if they are static or not. If
> anything changes on the filesystem then star will tell you about it.

I'm not sure I see how that is easier.

The command itself may be but it requires other moves not shown in
your command.

1) zfs create z2/projects

2)  star -copy -p -acl -sparse -dump -C old_dir . new_dir

As a bare minimum would be required.

whereas 
zfs send z1/proje...@snap |zfs receive z2/proje...@snap

Is all that is necessary using zfs send receive, and the new
filesystem z2/projects is created and populated with data from
z1/projects, not to mention a snapshot at z2/projects/.zfs/snapshot

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] first use send/receive... somewhat confused.

2009-07-13 Thread Dennis Clarke

> Dennis Clarke  writes:
>
>> This will probably get me bombed with napalm but I often just
>> use star from Jörg Schilling because its dead easy :
>>
>>   star -copy -p -acl -sparse -dump -C old_dir . new_dir
>>
>> and you're done.[1]
>>
>> So long as you have both the new and the old zfs/ufs/whatever[2]
>> filesystems mounted. It doesn't matter if they are static or not. If
>> anything changes on the filesystem then star will tell you about it.
>
> I'm not sure I see how that is easier.
>
> The command itself may be but it requires other moves not shown in
> your command.
>
> 1) zfs create z2/projects
>
> 2)  star -copy -p -acl -sparse -dump -C old_dir . new_dir
>
> As a bare minimum would be required.
>
> whereas
> zfs send z1/proje...@snap |zfs receive z2/proje...@snap
>
> Is all that is necessary using zfs send receive, and the new
> filesystem z2/projects is created and populated with data from
> z1/projects, not to mention a snapshot at z2/projects/.zfs/snapshot

sort of depends on what you want to get done and both work.

dc

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Bob Friesenhahn

On Mon, 13 Jul 2009, Alexander Skwar wrote:


Still on S10 U7 Sparc M4000.

So I'm now inline with the other results - the 2nd run is WAY slower. 4x
as slow.


It would be good to see results from a few OpenSolaris users running a 
recent 64-bit kernel, and with fast storage to see if this is an 
OpenSolaris issue as well.


It seems likely to be more evident with fast SAS disks or SAN devices 
rather than a few SATA disks since the SATA disks have more access 
latency.  Pools composed of mirrors should offer less read latency as 
well.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] questions regarding RFE 6334757 and CR 6322205 disk write cache. thanks (case 11356581)

2009-07-13 Thread Chunhuan . Shen

Hello experts,

I would like consult you some questions regarding RFE 6334757
and CR 6322205 (disk write cache).

==
RFE 6334757
disk write cache should be enabled and should have a tool to
switch it on and off

CR 6322205
Enable disk write cache if ZFS owns the disk
==

The cu found on SPARC Enterprise T5140, "Disk Write cache" was
disable when it was shifted from Sun factory, but after install ZFS,
"Disk Write cache" turned to be enable.

My questions are

1) When SPARC Enterprise T5140 was shifted from factory,
   is the value of Disk Write cache set to disabled, right ?

   From RFE 6334757, we can see it is disable when shifted
   from factory, but I am not sure..

2) On Solaris10, after installing ZFS and Zone, will
   the value of Disk Write cache be set to enable ?
   After what action is done, the value of Disk Write
   cache will be changed ?

   From CR 6322205, we can see as long as ZFS owns the disk,
   "Disk Write cache" will be set to enable, but what is
   the operation for "ZFS owns the disk" ?

   CR 6322205
   Enable disk write cache if ZFS owns the disk

3) If change the value of Disk Write cache from enabled
   to disable, is there any impact/problem to the system ?
 
4) For FRU parts, Write cache has been set to disabled, right ?

Thank you very much.
Best Regards
chunhuan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris 2008.11 - resilver still restarting

2009-07-13 Thread Ross
No, I don't think I need to take a disk out.  It's running ok now, it just 
seemed to get a bit confused at the start:

$ zpool status
  pool: rc-pool
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver in progress for 2h11m, 31.09% done, 4h51m to go
config:

NAME  STATE READ WRITE CKSUM
rc-pool   DEGRADED 0 0 0
  mirror  DEGRADED 0 0 0
c4t1d0ONLINE   0 0 0  101M resilvered
replacing DEGRADED 0 0 0
  c4t2d0s0/o  FAULTED  1.71M 23.3M 0  too many errors
  c4t2d0  ONLINE   0 0 0  62.3G resilvered
c5t1d0ONLINE   0 0 0  101M resilvered
  mirror  ONLINE   0 0 0
c4t3d0ONLINE   0 0 0
c5t2d0ONLINE   0 0 0
c5t0d0ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c5t3d0ONLINE   0 0 0
c4t5d0ONLINE   0 0 0
c4t4d0ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c5t4d0ONLINE   0 0 0
c5t5d0ONLINE   0 0 0
c4t6d0ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c4t7d0ONLINE   0 13.0K 0
c5t6d0ONLINE   0 0 0
c5t7d0ONLINE   0 0 0
logs  DEGRADED 0 0 0
  c6d1p0  ONLINE   0 0 0

errors: No known data errors
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] first use send/receive... somewhat confused.

2009-07-13 Thread Joerg Schilling
Harry Putnam  wrote:

> Dennis Clarke  writes:
>
> > This will probably get me bombed with napalm but I often just
> > use star from Jörg Schilling because its dead easy :
> >
> >   star -copy -p -acl -sparse -dump -C old_dir . new_dir
> >
> > and you're done.[1]
> >
> > So long as you have both the new and the old zfs/ufs/whatever[2]
> > filesystems mounted. It doesn't matter if they are static or not. If
> > anything changes on the filesystem then star will tell you about it.
>
> I'm not sure I see how that is easier.
>
> The command itself may be but it requires other moves not shown in
> your command.

Could you please explain your claims?

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] first use send/receive... somewhat confused.

2009-07-13 Thread Darren J Moffat

Joerg Schilling wrote:

Harry Putnam  wrote:


Dennis Clarke  writes:


This will probably get me bombed with napalm but I often just
use star from Jörg Schilling because its dead easy :

  star -copy -p -acl -sparse -dump -C old_dir . new_dir

and you're done.[1]

So long as you have both the new and the old zfs/ufs/whatever[2]
filesystems mounted. It doesn't matter if they are static or not. If
anything changes on the filesystem then star will tell you about it.

I'm not sure I see how that is easier.

The command itself may be but it requires other moves not shown in
your command.


Could you please explain your claims?


star doesn't (and shouldn't) create the destination ZFS filesystem like 
the zfs recv would.  It also doesn't preserve the dataset level would do.


One the other hand using star (or rsync which is what I tend to do) 
gives more flexibility in that the source and destination filesystem 
types can be different or even not a filesystem!


zfs send|recv and [g,s]tar exist for different purposes, but there are 
some overlapping use cases either either could do the job.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris 2008.11 - resilver still restarting

2009-07-13 Thread Ross
Gaaah, looks like I spoke too soon:

$ zpool status
  pool: rc-pool
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver in progress for 2h59m, 77.89% done, 0h50m to go
config:

NAME  STATE READ WRITE CKSUM
rc-pool   DEGRADED 0 0 0
  mirror  DEGRADED 0 0 0
c4t1d0ONLINE   0 0 0  218M resilvered
replacing UNAVAIL  0  963K 0  insufficient replicas
  c4t2d0s0/o  FAULTED  1.71M 23.4M 0  too many errors
  c4t2d0  REMOVED  0  964K 0  67.0G resilvered
c5t1d0ONLINE   0 0 0  218M resilvered
  mirror  ONLINE   0 0 0
c4t3d0ONLINE   0 0 0
c5t2d0ONLINE   0 0 0
c5t0d0ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c5t3d0ONLINE   0 0 0
c4t5d0ONLINE   0 0 0
c4t4d0ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c5t4d0ONLINE   0 0 0
c5t5d0ONLINE   0 0 0
c4t6d0ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c4t7d0ONLINE   0 13.0K 0
c5t6d0ONLINE   0 0 0
c5t7d0ONLINE   0 0 0
logs  DEGRADED 0 0 0
  c6d1p0  ONLINE   0 0 0

errors: No known data errors


There are a whole bunch of errors in /var/adm/messages:

Jul 13 15:56:53 rob-036 scsi: [ID 107833 kern.warning] WARNING: 
/p...@1,0/pci1022,7...@1/pci11ab,1...@2/d...@2,0 (sd3):
Jul 13 15:56:53 rob-036 Error for Command: write(10)   
Error Level: Retryable
Jul 13 15:56:53 rob-036 scsi: [ID 107833 kern.notice]   Requested Block: 
83778048  Error Block: 83778048
Jul 13 15:56:53 rob-036 scsi: [ID 107833 kern.notice]   Vendor: ATA 
   Serial Number: 
Jul 13 15:56:53 rob-036 scsi: [ID 107833 kern.notice]   Sense Key: 
Aborted_Command
Jul 13 15:56:53 rob-036 scsi: [ID 107833 kern.notice]   ASC: 0x0 (no additional 
sense info), ASCQ: 0x0, FRU: 0x0


Jul 13 15:57:31 rob-036 scsi: [ID 107833 kern.warning] WARNING: 
/p...@1,0/pci1022,7...@1/pci11ab,1...@2/d...@2,0 (sd3):
Jul 13 15:57:31 rob-036 Command failed to complete...Device is gone


Not what I would expect from a brand new drive!!

Does anybody have any tips on how i can work out where the fault lies here?  I 
wouldn't expect controller with so many other drives working, and what on earth 
is the proper technique for replacing a drive that failed part way through a 
resilver?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris 2008.11 - resilver still restarting

2009-07-13 Thread Galen

Ross,

The disks do have problems - that's why I'm resilvering.

I've seen zero read, write or checksum errors and had it loop. Now I  
do have a number of read errors on some of the disks, but I think  
resilvering is missing the point if it can't deal with corrupt data or  
disks with a small amount of unreadable data.


-Galen

On Jul 13, 2009, at 6:50 AM, Ross Walker wrote:



Maybe it's the disks firmware that is bad or maybe they're jumpered  
for 1.5Gbps on a 3.0 only bus? Or maybe it's a problem with the disk  
cable/bay/enclosure/slot?


It sounds like there is more then ZFS in the mix here. I wonder if  
the drive's status keeps flapping online/offline and either ZFS or  
FMA are too lax in marking a drive offline after recurring timeouts.


Take a look at your disk enclosure and iostat -En for the number of  
timeouts happenning.


-Ross


On Jul 13, 2009, at 9:05 AM, Galen  wrote:


Ross,

I feel you here, but I don't have much of a solution.

The best I can suggest (and has been my solution) is to take out  
the problematic disk, copy it to a fresh disk (preferably using  
something like dd_rescue) and then re-install.


It seems the resilvering loop is generally a result of a faulty  
device, but even if it is taken offline, you still have issues. I  
have had so many zpool resilvering loops, it's not funny. I'm  
running 2009.06 with all updates applied. I've had a very, very bad  
batch of disks.


I actually have a resilvering loop running right now, and I need to  
go copy off the offending device. Again.


I wish I had a better solution, because the zpool functions fine,  
no data errors, but resilvering loops forever. I love ZFS as an on- 
disk format. I increasingly hate the implementation of ZFS software.


-Galen

On Jul 13, 2009, at 5:34 AM, Ross wrote:

Just look at this.  I thought all the restarting resilver bugs  
were fixed, but it looks like something odd is still happening at  
the start:


Status immediately after starting resilver:

# zpool status
pool: rc-pool
state: DEGRADED
status: One or more devices has experienced an unrecoverable  
error.  An
 attempt was made to correct the error.  Applications are  
unaffected.
action: Determine if the device needs to be replaced, and clear  
the errors

 using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: resilver in progress for 0h0m, 0.00% done, 57h3m to go
config:

 NAME  STATE READ WRITE CKSUM
 rc-pool   DEGRADED 0 0 0
   mirror  DEGRADED 0 0 0
 c4t1d0ONLINE   0 0 0  5.56M resilvered
 replacing DEGRADED 0 0 0
   c4t2d0s0/o  FAULTED  1.71M 23.3M 0  too many errors
   c4t2d0  ONLINE   0 0 0  5.43M resilvered
 c5t1d0ONLINE   0 0 0  5.55M resilvered


And a few minutes later:

# zpool status
pool: rc-pool
state: DEGRADED
status: One or more devices has experienced an unrecoverable  
error.  An
 attempt was made to correct the error.  Applications are  
unaffected.
action: Determine if the device needs to be replaced, and clear  
the errors

 using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: resilver in progress for 0h0m, 0.00% done, 245h21m to go
config:

 NAME  STATE READ WRITE CKSUM
 rc-pool   DEGRADED 0 0 0
   mirror  DEGRADED 0 0 0
 c4t1d0ONLINE   0 0 0  1.10M resilvered
 replacing DEGRADED 0 0 0
   c4t2d0s0/o  FAULTED  1.71M 23.3M 0  too many errors
   c4t2d0  ONLINE   0 0 0  824K resilvered
 c5t1d0ONLINE   0 0 0  1.10M resilvered


It's gone from 5MB resilvered to 1MB, and increased the estimated  
time to 245 hours.

--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris 2008.11 - resilver still restarting

2009-07-13 Thread Ross Walker

On Jul 13, 2009, at 11:33 AM, Ross  wrote:


Gaaah, looks like I spoke too soon:

$ zpool status
 pool: rc-pool
state: DEGRADED
status: One or more devices has experienced an unrecoverable error.   
An
   attempt was made to correct the error.  Applications are  
unaffected.
action: Determine if the device needs to be replaced, and clear the  
errors

   using 'zpool clear' or replace the device with 'zpool replace'.
  see: http://www.sun.com/msg/ZFS-8000-9P
scrub: resilver in progress for 2h59m, 77.89% done, 0h50m to go
config:

   NAME  STATE READ WRITE CKSUM
   rc-pool   DEGRADED 0 0 0
 mirror  DEGRADED 0 0 0
   c4t1d0ONLINE   0 0 0  218M resilvered
   replacing UNAVAIL  0  963K 0  insufficient  
replicas

 c4t2d0s0/o  FAULTED  1.71M 23.4M 0  too many errors
 c4t2d0  REMOVED  0  964K 0  67.0G resilvered
   c5t1d0ONLINE   0 0 0  218M resilvered
 mirror  ONLINE   0 0 0
   c4t3d0ONLINE   0 0 0
   c5t2d0ONLINE   0 0 0
   c5t0d0ONLINE   0 0 0
 mirror  ONLINE   0 0 0
   c5t3d0ONLINE   0 0 0
   c4t5d0ONLINE   0 0 0
   c4t4d0ONLINE   0 0 0
 mirror  ONLINE   0 0 0
   c5t4d0ONLINE   0 0 0
   c5t5d0ONLINE   0 0 0
   c4t6d0ONLINE   0 0 0
 mirror  ONLINE   0 0 0
   c4t7d0ONLINE   0 13.0K 0
   c5t6d0ONLINE   0 0 0
   c5t7d0ONLINE   0 0 0
   logs  DEGRADED 0 0 0
 c6d1p0  ONLINE   0 0 0

errors: No known data errors


There are a whole bunch of errors in /var/adm/messages:

Jul 13 15:56:53 rob-036 scsi: [ID 107833 kern.warning] WARNING: / 
p...@1,0/pci1022,7...@1/pci11ab,1...@2/d...@2,0 (sd3):
Jul 13 15:56:53 rob-036 Error for Command: write 
(10)   Error Level: Retryable
Jul 13 15:56:53 rob-036 scsi: [ID 107833 kern.notice]   Requested  
Block: 83778048  Error Block: 83778048
Jul 13 15:56:53 rob-036 scsi: [ID 107833 kern.notice]   Vendor:  
ATASerial Number:
Jul 13 15:56:53 rob-036 scsi: [ID 107833 kern.notice]   Sense Key:  
Aborted_Command
Jul 13 15:56:53 rob-036 scsi: [ID 107833 kern.notice]   ASC: 0x0 (no  
additional sense info), ASCQ: 0x0, FRU: 0x0



Jul 13 15:57:31 rob-036 scsi: [ID 107833 kern.warning] WARNING: / 
p...@1,0/pci1022,7...@1/pci11ab,1...@2/d...@2,0 (sd3):
Jul 13 15:57:31 rob-036 Command failed to complete...Device  
is gone



Not what I would expect from a brand new drive!!

Does anybody have any tips on how i can work out where the fault  
lies here?  I wouldn't expect controller with so many other drives  
working, and what on earth is the proper technique for replacing a  
drive that failed part way through a resilver?


I really believe there is a problem with either the cabling or the  
enclosure's backplane here.


Two disks is statistical coincidence, three disks means, it ain't the  
disks that are bad (if you checked and there was no recall and the  
firmware is correct and up to date).


Fix the real problem and the disks already in place should resilver  
without further interruption.


-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] first use send/receive... somewhat confused.

2009-07-13 Thread Joerg Schilling
Darren J Moffat  wrote:

> >>> use star from Jörg Schilling because its dead easy :
> >>>
> >>>   star -copy -p -acl -sparse -dump -C old_dir . new_dir
...

> star doesn't (and shouldn't) create the destination ZFS filesystem like 
> the zfs recv would.  It also doesn't preserve the dataset level would do.

As star is software that cleany lives above the filesystem layer, this is
what people would expect ;-)

> One the other hand using star (or rsync which is what I tend to do) 
> gives more flexibility in that the source and destination filesystem 
> types can be different or even not a filesystem!

star is highly optimized and it's build in find(1) (using libfind)
gives you many interesting features.

zfs send seems to be tied to the zfs version and this is another reason
why zfs send | receive may not even work on a 100% zfs based playground.

> zfs send|recv and [g,s]tar exist for different purposes, but there are 
> some overlapping use cases either either could do the job.

It would be nice if there was a discussion that does mention features instead
of always proposing zfs send

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Ross
Interesting, I repeated the test on a few other machines running newer builds.  
First impressions are good:

snv_114, virtual machine, 1GB RAM, 30GB disk - 16% slowdown.
(Only 9GB free so I ran an 8GB test)

Doing initial (unmount/mount) 'cpio -o > /dev/null'
1683 blocks

real3m4.85s
user0m16.74s
sys 0m41.69s

Doing second 'cpio -o > /dev/null'
1683 blocks

real3m34.58s
user0m18.85s
sys 0m45.40s


And again on snv_117, Sun x2200, 40GB RAM, single 500GB sata disk:

First run (with the default 24GB set):

real6m25.15s
user0m11.93s
sys 0m54.93s

Doing second 'cpio -o > /dev/null'
48000247 blocks

real1m9.97s
user0m12.17s
sys 0m57.80s

... d'oh!  At least I know the ARC is working :-)


The second run, with a 98GB test is running now, I'll post the results in the 
morning.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] first use send/receive... somewhat confused.

2009-07-13 Thread Darren J Moffat

Joerg Schilling wrote:

Darren J Moffat  wrote:


use star from Jörg Schilling because its dead easy :

  star -copy -p -acl -sparse -dump -C old_dir . new_dir

...

star doesn't (and shouldn't) create the destination ZFS filesystem like 
the zfs recv would.  It also doesn't preserve the dataset level would do.


As star is software that cleany lives above the filesystem layer, this is
what people would expect ;-)


Indeed but that is why the "extra steps" are needed to create the 
destination ZFS filesystem.  Not a bad thing or a criticism of star just 
 a fact (and in answer to the question you asked).


One the other hand using star (or rsync which is what I tend to do) 
gives more flexibility in that the source and destination filesystem 
types can be different or even not a filesystem!


star is highly optimized and it's build in find(1) (using libfind)
gives you many interesting features.


I'm sure the authors of rsync could make a similar statement :-)


zfs send seems to be tied to the zfs version and this is another reason
why zfs send | receive may not even work on a 100% zfs based playground.


Indeed but it isn't, like tar (and variants there of), an archiver but a 
means of providing replication of ZFS datasets based on ZFS snapshots 
and works at the ZFS DMU layer.


zfs send|recv and [g,s]tar exist for different purposes, but there are 
some overlapping use cases either either could do the job.


It would be nice if there was a discussion that does mention features instead
of always proposing zfs send


In general I completely agree, however this particular thread (given its 
title) is about zfs send|recv :-)


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Brad Diggs
You might want to have a look at my blog on filesystem cache  
tuning...  It will probably help

you to avoid memory contention between the ARC and your apps.

http://www.thezonemanager.com/2009/03/filesystem-cache-optimization.html

Brad
Brad Diggs
Senior Directory Architect
Virtualization Architect
xVM Technology Lead


Sun Microsystems, Inc.
Phone x52957/+1 972-992-0002
Mail bradley.di...@sun.com
Blog http://TheZoneManager.com
Blog http://BradDiggs.com

On Jul 4, 2009, at 2:48 AM, Phil Harman wrote:

ZFS doesn't mix well with mmap(2). This is because ZFS uses the ARC  
instead of the Solaris page cache. But mmap() uses the latter. So if  
anyone maps a file, ZFS has to keep the two caches in sync.


cp(1) uses mmap(2). When you use cp(1) it brings pages of the files  
it copies into the Solaris page cache. As long as they remain there  
ZFS will be slow for those files, even if you subsequently use  
read(2) to access them.


If you reboot, your cpio(1) tests will probably go fast again, until  
someone uses mmap(2) on the files again. I think tar(1) uses  
read(2), but from my iPod I can't be sure. It would be interesting  
to see how tar(1) performs if you run that test before cp(1) on a  
freshly rebooted system.


I have done some work with the ZFS team towards a fix, but it is  
only currently in OpenSolaris.


The other thing that slows you down is that ZFS only flushes to disk  
every 5 seconds if there are no synchronous writes. It would be  
interesting to see iostat -xnz 1 while you are running your tests.  
You may find the disks are writing very efficiently for one second  
in every five.


Hope this helps,
Phil

blogs.sun.com/pgdh


Sent from my iPod

On 4 Jul 2009, at 05:26, Bob Friesenhahn  
 wrote:



On Fri, 3 Jul 2009, Bob Friesenhahn wrote:


Copy MethodData Rate
==
cpio -pdum75 MB/s
cp -r32 MB/s
tar -cf - . | (cd dest && tar -xf -)26 MB/s


It seems that the above should be ammended.  Running the cpio based  
copy again results in zpool iostat only reporting a read bandwidth  
of 33 MB/second.  The system seems to get slower and slower as it  
runs.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs rpool boot failed

2009-07-13 Thread Lori Alt

On 07/11/09 05:15, iman habibi wrote:

Dear Admins
I had solaris 10u8 installation based on ZFS (rpool)filesystem on two 
mirrored scsi disks in sunfire v880.
but after some months,when i reboot server with reboot command,it 
didnt boot from disks,and returns cant boot from boot media.

how can i recover some data from my previous installation?
also i run
>boot disk0 (failed)
>boot disk1 (failed)
also run >probe-scsi-all,,then boot from each disk,it returns failed,,why?
thank for any guide
Regards
  
Someone with more knowledge of the boot proms might have to help you 
with the boot failures, but if you're looking for a way to recover data 
from the root pools, you could try booting from your installation medium 
(whether that's a local CD/DVD or a network installation image), 
escaping out of the install, and try importing the pool.


Lori



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] first use send/receive... somewhat confused.

2009-07-13 Thread Harry Putnam
joerg.schill...@fokus.fraunhofer.de (Joerg Schilling) writes:

> Harry Putnam  wrote:
>
>> Dennis Clarke  writes:
>>
>> > This will probably get me bombed with napalm but I often just
>> > use star from Jörg Schilling because its dead easy :
>> >
>> >   star -copy -p -acl -sparse -dump -C old_dir . new_dir
>> >
>> > and you're done.[1]
>> >
>> > So long as you have both the new and the old zfs/ufs/whatever[2]
>> > filesystems mounted. It doesn't matter if they are static or not. If
>> > anything changes on the filesystem then star will tell you about it.
>>
>> I'm not sure I see how that is easier.
>>
>> The command itself may be but it requires other moves not shown in
>> your command.
>
> Could you please explain your claims?

Well it may be a case of newbie shooting off mouth on basis of small
knowledge but the setup you showed with star does not create a zfs
filesystem.  I guess that would have to be done externally.  

Whereas send/receive does that part for you.

My first thought was rsync... as a long time linux user... thats where
I would usually turn... until other posters pointed out how
send/receive works.  i.e.  It creates a new zfs filesystem for you,
which was exactly what I was after.

So by using send/receive with -u I was able in one move to
1) create a zfs filesystem
2) mount it automatically
3) transfer the data to the new fs

`star' only does the last one right?

I then had a few external chores like setting options or changing
mountpoint.   Something that would have had to be done using `star'
too.  (That is, before using star)

That alone was the basis of what you call my `claims'.

Would `star' move or create the .zfs directory?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Bob Friesenhahn

On Mon, 13 Jul 2009, Brad Diggs wrote:

You might want to have a look at my blog on filesystem cache tuning...  It 
will probably help

you to avoid memory contention between the ARC and your apps.

http://www.thezonemanager.com/2009/03/filesystem-cache-optimization.html


Your post makes it sound like there is not a bug in the operating 
system.  It does not take long to see that there is a bug in the 
Solaris 10 operating system.  It is not clear if the same bug is 
shared by current OpenSolaris since it seems like it has not been 
tested.


Solaris 10 U7 reads files that it has not seen before at a constant 
rate regardless of the amount of file data it has already read.  When 
the file is read a second time, the read is 4X or more slower.  If 
reads were slowing down because the ARC was slow to expunge stale 
data, then that would be apparent on the first read pass.  However, 
the reads are not slowing down in the first read pass.  ZFS goes into 
the weeds if it has seen a file before but none of the file data is 
resident in the ARC.


It is pathetic that a Sun RAID array that I paid $21K for out of my 
own life savings is not able to perform better than the cheapo 
portable USB drives that I use for backup because of ZFS.  This is 
making me madder and madder by the minute.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread sean walmsley
Sun X4500 (thumper) with 16Gb of memory running Solaris 10 U6 with patches 
current to the end of Feb 2009.

Current ARC size is ~6Gb.

ZFS filesystem created in a ~3.2 Tb pool consisting of 7 sets of mirrored 500Gb 
SATA drives.

I used 4000 8Mb files for a total of 32Gb.

run 1: ~140M/s average according to zpool iostat
real4m1.11s
user0m10.44s
sys 0m50.76s

run 2: ~37M/s average according to zpool iostat
real13m53.43s
user0m10.62s
sys 0m55.80s

A zfs unmount followed by a mount of the filesystem returned the performance to 
the run 1 case.

real3m58.16s
user0m11.54s
sys 0m51.95s

In summary, the second run performance drops to about 30% of the original run.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Mike Gerdts
On Mon, Jul 13, 2009 at 9:34 AM, Bob
Friesenhahn wrote:
> On Mon, 13 Jul 2009, Alexander Skwar wrote:
>>
>> Still on S10 U7 Sparc M4000.
>>
>> So I'm now inline with the other results - the 2nd run is WAY slower. 4x
>> as slow.
>
> It would be good to see results from a few OpenSolaris users running a
> recent 64-bit kernel, and with fast storage to see if this is an OpenSolaris
> issue as well.

Indeed it is.  Using ldoms with tmpfs as the backing store for virtual
disks, I see:

With S10u7:

# ./zfs-cache-test.ksh testpool
zfs create testpool/zfscachetest
Creating data file set (300 files of 8192000 bytes) under
/testpool/zfscachetest ...
Done!
zfs unmount testpool/zfscachetest
zfs mount testpool/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
4800025 blocks

real0m30.35s
user0m9.90s
sys 0m19.81s

Doing second 'cpio -o > /dev/null'
4800025 blocks

real0m43.95s
user0m9.67s
sys 0m17.96s

Feel free to clean up with 'zfs destroy testpool/zfscachetest'.

# ./zfs-cache-test.ksh testpool
zfs unmount testpool/zfscachetest
zfs mount testpool/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
4800025 blocks

real0m31.14s
user0m10.09s
sys 0m20.47s

Doing second 'cpio -o > /dev/null'
4800025 blocks

real0m40.24s
user0m9.68s
sys 0m17.86s

Feel free to clean up with 'zfs destroy testpool/zfscachetest'.


When I move the zpool to a 2009.06 ldom,

# /var/tmp/zfs-cache-test.ksh testpool
zfs create testpool/zfscachetest
Creating data file set (300 files of 8192000 bytes) under
/testpool/zfscachetest ...
Done!
zfs unmount testpool/zfscachetest
zfs mount testpool/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
4800025 blocks

real0m30.09s
user0m9.58s
sys 0m19.83s

Doing second 'cpio -o > /dev/null'
4800025 blocks

real0m44.21s
user0m9.47s
sys 0m18.18s

Feel free to clean up with 'zfs destroy testpool/zfscachetest'.

# /var/tmp/zfs-cache-test.ksh testpool
zfs unmount testpool/zfscachetest
zfs mount testpool/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
4800025 blocks

real0m29.89s
user0m9.58s
sys 0m19.72s

Doing second 'cpio -o > /dev/null'
4800025 blocks

real0m44.40s
user0m9.59s
sys 0m18.24s

Feel free to clean up with 'zfs destroy testpool/zfscachetest'.

Notice in these runs that each time the usr+sys time of the first run
adds up to the elapsed time - the rate was choked by CPU.  This is
verified by "prstat -mL".  The second run seemed to be slow due to a
lock as we had just demonstrated that the IO path can do more (not an
IO bottleneck) and "prstat -mL shows cpio at in sleep for a
significant amount of time.

FWIW, I hit another bug if I turn off primarycache.

http://defect.opensolaris.org/bz/show_bug.cgi?id=10004

This causes really abysmal performance - but equally so for repeat runs!

# /var/tmp/zfs-cache-test.ksh testpool
zfs unmount testpool/zfscachetest
zfs mount testpool/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
4800025 blocks

real4m21.57s
user0m9.72s
sys 0m36.30s

Doing second 'cpio -o > /dev/null'
4800025 blocks

real4m21.56s
user0m9.72s
sys 0m36.19s

Feel free to clean up with 'zfs destroy testpool/zfscachetest'.


This bug report contains more detail of the configuration.  One thing
not covered in that bug report is that the S10u7 ldom has 2048 MB of
RAM and the 2009.06 ldom has 2024 MB of RAM.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] questions regarding RFE 6334757 and CR 6322205 disk write cache. thanks (case 11356581)

2009-07-13 Thread sean walmsley
1) Turning on write caching is potentially dangerous because the disk will 
indicate that data has been written (to cache) before it has actually been 
written to non-volatile storage (disk). Since the factory has no way of knowing 
how you'll use your T5140, I'm guessing that they set the disk write caches off 
by default.

2) Since ZFS "knows" about disk caches and ensures that it issues synchronous 
writes where required, it is safe to turn on write caching when the *ENTIRE* 
disk is used for ZFS. Accordingly, ZFS will attempt to turn on a disk's write 
cache whenever you add the *ENTIRE* disk to a zpool. If you add only a disk 
slice to a zpool, ZFS will not try to turn on write caching since it doesn't 
know whether other portions of the disk will be used for applications which are 
not write-cache safe.

zpool create pool01 c0t0d0<- ZFS will try to turn on disk write cache 
since using entire disk

zpool create pool02 c0t0d0s1<- ZFS will not try to turn on disk write cache 
(only using 1 slice)

To avoid future disk replacement problems (e.g. if the replacement disk is 
slightly smaller), we generally create a single disk slice that takes up almost 
the entire disk and then build our pools on these slices. ZFS doesn't turn on 
the write cache in this case, but since we know that the disk is only being 
used for ZFS we can (and do!) safety turn on the write cache manually.

3) You can change the write (and read) cache settings using the "cache" submenu 
of the "format -e" command. If you disable the write cache where it could 
safely be enabled you will only reduce the performance of the system. If you 
enable the write cache where it should not be enabled, you run the risk of data 
loss and/or corruption in the event of a power loss.

4) I wouldn't assume any particular setting for FRU parts, although I believe 
that Sun parts generally ship with the write caches disabled. Better to 
explicitly check using "format -e".
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] questions regarding RFE 6334757 and CR 6322205 disk write cache. thanks (case 11356581)

2009-07-13 Thread sean walmsley
Something caused my original message to get cut off. Here is the full post:

1) Turning on write caching is potentially dangerous because the disk will 
indicate that data has been written (to cache) before it has actually been 
written to non-volatile storage (disk). Since the factory has no way of knowing 
how you'll use your T5140, I'm guessing that they set the disk write caches off 
by default.

2) Since ZFS "knows" about disk caches and ensures that it issues synchronous 
writes where required, it is safe to turn on write caching when the *ENTIRE* 
disk is used for ZFS. Accordingly, ZFS will attempt to turn on a disk's write 
cache whenever you add the
*ENTIRE* disk to a zpool. If you add only a disk slice to a zpool, ZFS will not 
try to turn on write caching since it doesn't know whether other portions of 
the disk will be used for applications which are not write-cache safe.

zpool create pool01 c0t0d0ZFS will try to turn on disk write cache 
since using entire disk

zpool create pool02 c0t0d0s1  ZFS will not try to turn on disk write cache 
(only using 1 slice)

To avoid future disk replacement problems (e.g. if the replacement disk is 
slightly smaller), we generally create a single disk slice that takes up almost 
the entire disk and then build our pools on these slices. ZFS doesn't turn on 
the write cache in this case, but since we know that the disk is only being 
used for ZFS we can (and do!) safety turn on the write cache manually.

3) You can change the write (and read) cache settings using the "cache" submenu 
of the "format -e" command. If you disable the write cache where it could 
safely be enabled you will only reduce the performance of the system. If you 
enable the write cache where
it should not be enabled, you run the risk of data loss and/or corruption in 
the event of a power loss.

4) I wouldn't assume any particular setting for FRU parts, although I believe 
that Sun parts generally ship with the write caches
disabled. Better to explicitly check using "format -e".
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Bob Friesenhahn

On Mon, 13 Jul 2009, Mike Gerdts wrote:


FWIW, I hit another bug if I turn off primarycache.

http://defect.opensolaris.org/bz/show_bug.cgi?id=10004

This causes really abysmal performance - but equally so for repeat runs!


It is quite facinating seeing the huge difference in I/O performance 
from these various reports.  The bug you reported seems likely to be 
that without at least a little bit of caching, it is necessary to 
re-request the underlying 128K ZFS block several times as the program 
does numerous smaller I/Os (cpio uses 10240 bytes?) across it. 
Totally disabling data caching seems best reserved for block-oriented 
databases which are looking for a substitute for directio(3C).


It is easily demonstrated that the problem seen in Solaris 10 (jury 
still out on OpenSolaris although one report has been posted) is due 
to some sort of confusion.  It is not due to delays caused by purging 
old data from the ARC.  If these delays were caused by purging data 
from the ARC, then 'zfs iostat' would start showing lower read 
performance once the ARC becomes full, but that is not the case.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Joerg Schilling
Bob Friesenhahn  wrote:

> On Mon, 13 Jul 2009, Mike Gerdts wrote:
> >
> > FWIW, I hit another bug if I turn off primarycache.
> >
> > http://defect.opensolaris.org/bz/show_bug.cgi?id=10004
> >
> > This causes really abysmal performance - but equally so for repeat runs!
>
> It is quite facinating seeing the huge difference in I/O performance 
> from these various reports.  The bug you reported seems likely to be 
> that without at least a little bit of caching, it is necessary to 
> re-request the underlying 128K ZFS block several times as the program 
> does numerous smaller I/Os (cpio uses 10240 bytes?) across it. 

cpio reads/writes in 8192 byte chunks from the filesystem.

BTW: star by default creates a shared memory based FIFO of 8 MB size and
reads in the biggest possible size that would currently fit into the FIFO.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Jim Mauro

Bob - Have you filed a bug on this issue?
I am not up to speed on this thread, so I can
not comment on whether or not there is a bug
here, but you seem to have a test case and supporting
data. Filing a bug will get the attention of ZFS
engineering.

Thanks,
/jim


Bob Friesenhahn wrote:

On Mon, 13 Jul 2009, Mike Gerdts wrote:


FWIW, I hit another bug if I turn off primarycache.

http://defect.opensolaris.org/bz/show_bug.cgi?id=10004

This causes really abysmal performance - but equally so for repeat runs!


It is quite facinating seeing the huge difference in I/O performance 
from these various reports.  The bug you reported seems likely to be 
that without at least a little bit of caching, it is necessary to 
re-request the underlying 128K ZFS block several times as the program 
does numerous smaller I/Os (cpio uses 10240 bytes?) across it. Totally 
disabling data caching seems best reserved for block-oriented 
databases which are looking for a substitute for directio(3C).


It is easily demonstrated that the problem seen in Solaris 10 (jury 
still out on OpenSolaris although one report has been posted) is due 
to some sort of confusion.  It is not due to delays caused by purging 
old data from the ARC.  If these delays were caused by purging data 
from the ARC, then 'zfs iostat' would start showing lower read 
performance once the ARC becomes full, but that is not the case.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, 
http://www.simplesystems.org/users/bfriesen/

GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] deduplication

2009-07-13 Thread Miles Nordin
> "jcm" == James C McPherson  writes:
> "dm" == David Magda  writes:

   jcm> What I can say, however, is that "open source" does not always
   jcm> equate to requiring "open development".

+1

To maintain what draws me to free software, you must 

 * release binaries and source at the same time

   that also means none of this bullshit where you send someone a
   binary of your work for ``testing''.  BSD developers do this all
   the time not really meaning anything bad by it, but for CDDL or GPL
   both by law and by custom, you do ``testing'' then you get to see
   source, period.

 * allow free enough access to the source that whoever gets it can
   fork and continue development under any organizing process they
   want.

The organizing process for development is also worth talking about,
but for me it isn't such a clear political movement.  Even the
projects that unlike Solaris have always been open, where openness is
their core goal above anything else, still benefit from openbsd
hackathons, the .nl HAR camp, and other meetings where insiders who
know each other personally sequester themselves in physical proximity
and privately work on something which they release all at once when
the camping trip is over.

Private development branches can be good, and certainly don't scare me
away from a project the same way as intentional GPL incompatibility,
closed-source stable branches, proprietary installer-maker build
scripts, scattering of binary blobs throughout the tree, selling
hardware as a VAR then dropping the ball getting free drivers out of
the OEM's, and so on.

There are other organizing things I absolutely do have a problem with.
For example, attracting discussion to censored web forums (which on
OpenSolaris we do NOT have because here the forums are just
extra-friendly mailing list archives plus a posting interface for
web20 idiots, but many Linux subprojects do have censored forums).
And PR-expurgated read-only bug databases (which OpenSolaris does have
while Ubuntu, Debian, Gentoo, u.s.w. do not).  

There's a second problem with GPL at Akamai and Google.  Suppose
Greenbytes wrote dedup changes but didn't release their source, then
started selling deduplicated hosted storage over vlan in several major
telco hotels.  I'd have a political/community-advocacy problem with
that, and probably no legal remedy.


pgplo4voDJeYe.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Bob Friesenhahn

On Mon, 13 Jul 2009, Joerg Schilling wrote:


cpio reads/writes in 8192 byte chunks from the filesystem.


Yes, I was just reading the cpio manual page and see that.  I think 
that re-reading the 128K zfs block 16 times to satisfy each request 
for 8192 bytes explains the 16X performance loss when caching is 
disabled.  I don't think that this is strictly a bug since it is what 
the database folks are looking for.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Mike Gerdts
On Mon, Jul 13, 2009 at 3:16 PM, Joerg
Schilling wrote:
> Bob Friesenhahn  wrote:
>
>> On Mon, 13 Jul 2009, Mike Gerdts wrote:
>> >
>> > FWIW, I hit another bug if I turn off primarycache.
>> >
>> > http://defect.opensolaris.org/bz/show_bug.cgi?id=10004
>> >
>> > This causes really abysmal performance - but equally so for repeat runs!
>>
>> It is quite facinating seeing the huge difference in I/O performance
>> from these various reports.  The bug you reported seems likely to be
>> that without at least a little bit of caching, it is necessary to
>> re-request the underlying 128K ZFS block several times as the program
>> does numerous smaller I/Os (cpio uses 10240 bytes?) across it.
>
> cpio reads/writes in 8192 byte chunks from the filesystem.
>
> BTW: star by default creates a shared memory based FIFO of 8 MB size and
> reads in the biggest possible size that would currently fit into the FIFO.
>
> Jörg

Using cpio's -C option seems to not change the behavior for this bug,
but I did see a performance difference with the case where I hadn't
modified the zfs caching behavior.  That is, the performance of the
tmpfs backed vdisk more than doubled with "cpio -o -C $((1024 * 1024))
>/dev/null".  At this point cpio was spending roughly 13% usr and 87%
sys.

I haven't tried star, but I did see that I could also reproduce with
"cat $file | cat > /dev/null".  This seems like a worthless use of
cat, but it forces cat to actually copy data from input to output
unlike when cat can mmap input and output.  When it does that and
output is /dev/null Solaris is smart enough to avoid any reads.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Mike Gerdts
On Mon, Jul 13, 2009 at 3:23 PM, Bob
Friesenhahn wrote:
> On Mon, 13 Jul 2009, Joerg Schilling wrote:
>>
>> cpio reads/writes in 8192 byte chunks from the filesystem.
>
> Yes, I was just reading the cpio manual page and see that.  I think that
> re-reading the 128K zfs block 16 times to satisfy each request for 8192
> bytes explains the 16X performance loss when caching is disabled.  I don't
> think that this is strictly a bug since it is what the database folks are
> looking for.
>
> Bob

I did other tests with "dd bs=128k" and verified via truss that each
read(2) was returning 128K.  I thought I had seen excessive reads
there too, but now I can't reproduce that.  Creating another fs with
recordsize=8k seems to make this behavior go away - things seem to be
working as designed. I'll go update the (nota-)bug.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Ross Walker
On Jul 13, 2009, at 2:54 PM, Bob Friesenhahn > wrote:



On Mon, 13 Jul 2009, Brad Diggs wrote:

You might want to have a look at my blog on filesystem cache  
tuning...  It will probably help

you to avoid memory contention between the ARC and your apps.

http://www.thezonemanager.com/2009/03/filesystem-cache-optimization.html


Your post makes it sound like there is not a bug in the operating  
system.  It does not take long to see that there is a bug in the  
Solaris 10 operating system.  It is not clear if the same bug is  
shared by current OpenSolaris since it seems like it has not been  
tested.


Solaris 10 U7 reads files that it has not seen before at a constant  
rate regardless of the amount of file data it has already read.   
When the file is read a second time, the read is 4X or more slower.   
If reads were slowing down because the ARC was slow to expunge stale  
data, then that would be apparent on the first read pass.  However,  
the reads are not slowing down in the first read pass.  ZFS goes  
into the weeds if it has seen a file before but none of the file  
data is resident in the ARC.


It is pathetic that a Sun RAID array that I paid $21K for out of my  
own life savings is not able to perform better than the cheapo  
portable USB drives that I use for backup because of ZFS.  This is  
making me madder and madder by the minute.


Have you tried limiting the ARC so it doesn't squash the page cache?

Make sure page cache has enough for mmap plus buffers for bouncing  
between it and the ARC. I would say 1GB minimum, 2 to be safe.


-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Bob Friesenhahn

On Mon, 13 Jul 2009, Mike Gerdts wrote:


Using cpio's -C option seems to not change the behavior for this bug,
but I did see a performance difference with the case where I hadn't
modified the zfs caching behavior.  That is, the performance of the
tmpfs backed vdisk more than doubled with "cpio -o -C $((1024 * 1024))

/dev/null".  At this point cpio was spending roughly 13% usr and 87%

sys.


Interesting.  I just updated zfs-cache-test.ksh on my web site so that 
it uses 131072 byte blocks.  I see a tiny improvement in performance 
from doing this, but I do see a bit less CPU consumption so the CPU 
consumption is essentially zero.  The bug remains. It seems best to 
use ZFS's ideal block size so that issues don't get confused.


Using an ARC monitoring script called 'arcstat.pl' I see a huge number 
of 'dmis' events when performance is poor.  The ARC size is 7GB, which 
is less than its prescribed cap of 10GB.


Better:

Time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz c
15:39:37   20K1K  65801K  10019  100 7G   10G
15:39:38   19K1K  55701K  10019  100 7G   10G
15:39:39   19K1K  65401K  10018  100 7G   10G
15:39:40   17K1K  65101K  10017  100 7G   10G

Worse:

Time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz c
15:43:244K   280  6   2806 00 4  100 9G   10G
15:43:254K   277  6   2776 00 4  100 9G   10G
15:43:264K   268  6   2686 00 5  100 9G   10G
15:43:274K   259  6   2596 00 4  100 9G   10G

An ARC stats summary from a tool called 'arc_summary.pl' is appended 
to this message.


Operation is quite consistent across the full span of files.  Since 
'dmis' is still low when things are "good" (and even when the ARC has 
surely cycled already) this leads me to believe that prefetch is 
mostly working and is usually satisfying read requests.  When things 
go bad I see that 'dmiss' becomes 100% of the misses.  A hypothesis is 
that if zfs thinks that the data might be in the ARC (due to having 
seen the file before) that it disables file prefetch entirely, 
assuming that it can retrieve the data from its cache.  Then once it 
finally determines that there is no cached data after all, it issues a 
read request.


Even the "better" read performance is 1/2 of what I would expect from 
my hardware and based on prior test results from 'iozone'.  More 
prefetch would surely help.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

System Memory:
 Physical RAM:  20470 MB
 Free Memory :  2511 MB
 LotsFree:  312 MB

ZFS Tunables (/etc/system):
 * set zfs:zfs_arc_max = 0x3
 set zfs:zfs_arc_max = 0x28000
 * set zfs:zfs_arc_max = 0x2
 set zfs:zfs_write_limit_override = 0xea60
 * set zfs:zfs_write_limit_override = 0xa000
 set zfs:zfs_vdev_max_pending = 5

ARC Size:
 Current Size: 8735 MB (arcsize)
 Target Size (Adaptive):   10240 MB (c)
 Min Size (Hard Limit):1280 MB (zfs_arc_min)
 Max Size (Hard Limit):10240 MB (zfs_arc_max)

ARC Size Breakdown:
 Most Recently Used Cache Size:  95%9791 MB (p)
 Most Frequently Used Cache Size: 4%448 MB (c-p)

ARC Efficency:
 Cache Access Total: 827767314
 Cache Hit Ratio:  96%   800123657  [Defined State for 
buffer]
 Cache Miss Ratio:  3%   27643657   [Undefined State for 
Buffer]
 REAL Hit Ratio:   89%   743665046  [MRU/MFU Hits Only]

 Data Demand   Efficiency:99%
 Data Prefetch Efficiency:61%

CACHE HITS BY CACHE LIST:
  Anon:5%47497010   [ New 
Customer, First Cache Hit ]
  Most Recently Used: 33%271365449 (mru)[ 
Return Customer ]
  Most Frequently Used:   59%472299597 (mfu)[ 
Frequent Customer ]
  Most Recently Used Ghost:0%1700764 (mru_ghost)[ 
Return Customer Evicted, Now Back ]
  Most Frequently Used Ghost:  0%7260837 (mfu_ghost)[ 
Frequent Customer Evicted, Now Back ]
CACHE HITS BY DATA TYPE:
  Demand Data:73%589582518
  Prefetch Data:   2%20424879
  Demand Metadata:17%139111510
  Prefetch Metadata:   6%51004750
CACHE MISSES BY DATA TYPE:
  Demand Data:21%5814459
  Prefetch Data:  46%12788265
  Demand Metadata:27%7700169
	  Prefetch Metada

Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Bob Friesenhahn

On Mon, 13 Jul 2009, Ross Walker wrote:


Have you tried limiting the ARC so it doesn't squash the page cache?


Yes, the ARC is limited to 10GB, leaving another 10GB for the OS and 
applications.  Resource limits are not the problem.  There is a ton of 
memory and CPU to go around.


Current /etc/system tunables:

set maxphys = 0x2
set zfs:zfs_arc_max = 0x28000
set zfs:zfs_write_limit_override = 0xea60
set zfs:zfs_vdev_max_pending = 5

Make sure page cache has enough for mmap plus buffers for bouncing between it 
and the ARC. I would say 1GB minimum, 2 to be safe.


In this testing mmap is not being used (cpio does not use mmap) so the 
page cache is not an issue.  It does become an issue for 'cp -r' 
though where we see the I/O be substantially (and essentially 
permanently) reduced even more for impacted files until the filesystem 
is unmounted.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Mark Shellenbaum

Bob Friesenhahn wrote:
There has been no forward progress on the ZFS read performance issue for 
a week now.  A 4X reduction in file read performance due to having read 
the file before is terrible, and of course the situation is considerably 
worse if the file was previously mmapped as well.  Many of us have sent 
a lot of money to Sun and were not aware that ZFS is sucking the life 
out of our expensive Sun hardware.


It is trivially easy to reproduce this problem on multiple machines. For 
example, I reproduced it on my Blade 2500 (SPARC) which uses a simple 
mirrored rpool.  On that system there is a 1.8X read slowdown from the 
file being accessed previously.


In order to raise visibility of this issue, I invite others to see if 
they can reproduce it in their ZFS pools.  The script at


http://www.simplesystems.org/users/bfriesen/zfs-discuss/zfs-cache-test.ksh

Implements a simple test.  It requires a fair amount of disk space to 
run, but the main requirement is that the disk space consumed be more 
than available memory so that file data gets purged from the ARC. The 
script needs to run as root since it creates a filesystem and uses 
mount/umount.  The script does not destroy any data.


There are several adjustments which may be made at the front of the 
script.  The pool 'rpool' is used by default, but the name of the pool 
to test may be supplied via an argument similar to:


# ./zfs-cache-test.ksh Sun_2540
zfs create Sun_2540/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under 
/Sun_2540/zfscachetest ...

Done!
zfs unmount Sun_2540/zfscachetest
zfs mount Sun_2540/zfscachetest



I've opened the following bug to track this issue:

6859997 zfs caching performance problem

We need to track down if/when this problem was introduced or if it has 
always been there.



   -Mark
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Joerg Schilling
Bob Friesenhahn  wrote:

> On Mon, 13 Jul 2009, Joerg Schilling wrote:
> >
> > cpio reads/writes in 8192 byte chunks from the filesystem.
>
> Yes, I was just reading the cpio manual page and see that.  I think 
> that re-reading the 128K zfs block 16 times to satisfy each request 
> for 8192 bytes explains the 16X performance loss when caching is 
> disabled.  I don't think that this is strictly a bug since it is what 
> the database folks are looking for.

cpio spends 1.6x more SYStem CPU time than star. This may mainly be a result
from the fact that cpio (when using the cpio archive format) reads/writes 512 
byte blocks from/to the archive file.

cpio by default spends 19x more USER CPU time than star. This seems to be a 
result of the inapropriate header structure with the cpio archive format and 
reblocking and cannot be easily changed (well you could use "scpio" - or in 
other words the "cpio" CLI personality of star, but this reduces the USER CPU
time only by 10%-50% compared to Sun cpio).

cpio is a program from the past that does no fit well in our current world.
The internal limits cannot be lifted without creating a new incompatible 
archive format.

In other words: if you use cpio for your work, you have to live with it's 
problems ;-)

If you like to play with different parameter values (e.g. read sizes), cpio 
is unsuitable for tests. Star allows you to set big filesystem read sizes by
using the FIFO and playing with the fifo size and smell filesystem read sizes by
switching off the FIFO and playing with the archive block size.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Joerg Schilling
Mike Gerdts  wrote:

> Using cpio's -C option seems to not change the behavior for this bug,
> but I did see a performance difference with the case where I hadn't
> modified the zfs caching behavior.  That is, the performance of the
> tmpfs backed vdisk more than doubled with "cpio -o -C $((1024 * 1024))
> >/dev/null".  At this point cpio was spending roughly 13% usr and 87%
> sys.

As mentioned before, a lot of the user CPU time from cpio is spend to 
create cpio archive headers or caused by the fact that cpio archives copy 
the file content to unaligned archive locations while the "tar" archive format
starts each new file on a modulo 512 offset in the archive. This requires a lot
of unneeded copying of file data. You can of course slightly modify parameters
even with cpio. I am not sure what you mean with "13% usr and 87%" as star
typically spends 6% of the wall clock time in user+sys CPU where the user 
CPU time is typically only 1.5% of the system CPU time.

In the "cached" case, it is obviously ZFS that's responsible for the slow down, 
regardless what cpio did in the other case.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Joerg Schilling
Bob Friesenhahn  wrote:

> On Mon, 13 Jul 2009, Mike Gerdts wrote:
> >
> > Using cpio's -C option seems to not change the behavior for this bug,
> > but I did see a performance difference with the case where I hadn't
> > modified the zfs caching behavior.  That is, the performance of the
> > tmpfs backed vdisk more than doubled with "cpio -o -C $((1024 * 1024))
> >> /dev/null".  At this point cpio was spending roughly 13% usr and 87%
> > sys.
>
> Interesting.  I just updated zfs-cache-test.ksh on my web site so that 
> it uses 131072 byte blocks.  I see a tiny improvement in performance 
> from doing this, but I do see a bit less CPU consumption so the CPU 
> consumption is essentially zero.  The bug remains. It seems best to 
> use ZFS's ideal block size so that issues don't get confused.

If you continue to use cpio and the cpio archive format, you force copying a 
lot of data as the cpio archive format does use odd header sizes and starts
new files "unaligned" directly after the archive header.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Bob Friesenhahn

On Mon, 13 Jul 2009, Jim Mauro wrote:

Bob - Have you filed a bug on this issue? I am not up to speed on 
this thread, so I can not comment on whether or not there is a bug 
here, but you seem to have a test case and supporting data. Filing a 
bug will get the attention of ZFS engineering.


No, I have not filed a bug report yet.  Any problem report to Sun's 
Service department seems to require at least one day's time.


I was curious to see if recent OpenSolaris suffers from the same 
problem, but posted results (thus far) are not as conclusive as they 
are for Solaris 10.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Mike Gerdts
On Mon, Jul 13, 2009 at 4:41 PM, Bob
Friesenhahn wrote:
> On Mon, 13 Jul 2009, Jim Mauro wrote:
>
>> Bob - Have you filed a bug on this issue? I am not up to speed on this
>> thread, so I can not comment on whether or not there is a bug here, but you
>> seem to have a test case and supporting data. Filing a bug will get the
>> attention of ZFS engineering.
>
> No, I have not filed a bug report yet.  Any problem report to Sun's Service
> department seems to require at least one day's time.
>
> I was curious to see if recent OpenSolaris suffers from the same problem,
> but posted results (thus far) are not as conclusive as they are for Solaris
> 10.

It doesn't seem to be quite as bad as S10, but there is certainly a hit.

# /var/tmp/zfs-cache-test.ksh
zfs create rpool/zfscachetest
Creating data file set (400 files of 8192000 bytes) under
/rpool/zfscachetest ...
Done!
zfs unmount rpool/zfscachetest
zfs mount rpool/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
6400033 blocks

real1m26.16s
user0m12.83s
sys 0m25.88s

Doing second 'cpio -o > /dev/null'
6400033 blocks

real2m44.46s
user0m12.59s
sys 0m24.34s

Feel free to clean up with 'zfs destroy rpool/zfscachetest'.

# cat /etc/release
OpenSolaris 2009.06 snv_111b SPARC
   Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
Use is subject to license terms.
  Assembled 07 May 2009

# uname -srvp
SunOS 5.11 snv_111b sparc

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Bob Friesenhahn

On Mon, 13 Jul 2009, Mark Shellenbaum wrote:


I've opened the following bug to track this issue:

6859997 zfs caching performance problem

We need to track down if/when this problem was introduced or if it 
has always been there.


I think that it has always been there as long as I have been using ZFS 
(1-3/4 years).  Sometimes it takes a while for me to wake up and smell 
the coffee.


Meanwhile I have opened a formal service request (IBIS 71326296) with 
Sun Support.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Bob Friesenhahn

On Mon, 13 Jul 2009, Joerg Schilling wrote:


If you continue to use cpio and the cpio archive format, you force copying a
lot of data as the cpio archive format does use odd header sizes and starts
new files "unaligned" directly after the archive header.


Note that the output of cpio is sent to /dev/null in this test so it 
is only the reading part which is significant as long as cpio's CPU 
use is low.  Sun Service won't have a clue about 'star' since it is 
not part of Solaris 10.  It is best to stick with what they know so 
the problem report won't be rejected.


If star is truely more efficient than cpio, it may make the difference 
even more obvious.  What did you discover when you modified my test 
script to use 'star' instead?


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Randy Jones
Bob: Sun v490, 4x1.35 processors, 32GB ram,  Solaris 10u7 working with a raidz1 
zpool made up of 6x146 sas drives on a j4200. Results of your running your 
script:

# zfs-cache-test.ksh pool2
zfs create pool2/zfscachetest
Creating data file set (6000 files of 8192000 bytes) under /pool2/zfscachetest 
...
Done!
zfs unmount pool2/zfscachetest
zfs mount pool2/zfscachetest

Doing initial (unmount/mount) 'cpio -C 131072 -o > /dev/null'
96000512 blocks

real5m32.58s
user0m12.75s
sys 2m56.58s

Doing second 'cpio -C 131072 -o > /dev/null'
96000512 blocks

real17m26.68s
user0m12.97s
sys 4m34.33s

Feel free to clean up with 'zfs destroy pool2/zfscachetest'.
#

Same results as you are seeing.

Thanks Randy
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs snapshoot of rpool/* to usb removable drives?

2009-07-13 Thread Carl Brewer
last question! I promise :)

Google's not helped me much, but that's probably my keyword-ignorance.

I have two USB HDD's, I want to swap them over, so there's one off-site and one 
plugged in, they get swapped over weekly.  Not perfect, but sufficient for this 
site's risk assessment. They'll have the relevant ZFS snapshots sent to them.  
I assume then that if I plug one of these drives into another box that groks 
ZFS it'll see a filesystem and be able to access the files etc.

I formatted two drives, 1TB drives.

The tricky bit I think, is swapping them.  I can mount one, and then send/recv 
to it, but what's the best way to automate the process of swapping the drives? 
A human has to physically switch them on & off and plug them in etc, but what's 
the process to do it in ZFS?

Does each drive need a separate mountpoint?  In the old UFS days I'd have just 
mounted them from an entry in (v)fstab in a cronjob and they'd be the same as 
far as everything was concerned, but with ZFS I'm a little confused.

Can anyone here outline the procedure to do this assuming that the USB drives 
will be plugged into the same USB port (the server will be in a cabinet, the 
backup drives outside of it so they don't have to open the cabinet, and thus, 
bump things that don't like to be bumped!).

Thankyou again for everyone's help.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Ross
Ok, build 117 does seem a lot better.  The second run is slower, but not by 
such a huge margin. This was the end of the 98GB test:

Creating data file set (12000 files of 8192000 bytes) under /rpool/zfscachetest 
...
Done!
zfs unmount rpool/zfscachetest
zfs mount rpool/zfscachetest

Doing initial (unmount/mount) 'cpio -o > /dev/null'
192000985 blocks

real26m17.80s
user0m47.55s
sys 3m56.94s

Doing second 'cpio -o > /dev/null'
192000985 blocks

real27m14.35s
user0m46.84s
sys 4m39.85s
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss