[zfs-discuss] zpool replace lockup / replace process now stalled, how to fix?

2010-05-17 Thread Michael Donaghy
Hi,

I recently moved to a freebsd/zfs system for the sake of data integrity, after 
losing my data on linux. I've now had my first hard disk failure; the bios 
refused to even boot with the failed drive (ad18) connected, so I removed it.
I have another drive, ad16, which had enough space to replace the failed one, 
so I partitioned it and attempted to use "zpool replace" to replace the failed 
partitions for new ones, i.e. "zpool replace tank ad18s1d ad16s4d". This 
seemed to simply hang, with no processor or disk use; any "zpool status" 
commands also hung. Eventually I attempted to reboot the system, which also 
eventually hung; after waiting a while, having no other option, rightly or 
wrongly, I hard-rebooted. Exactly the same behaviour happened with the other 
zpool replace.

Now, my zpool status looks like:
arcueid ~ $ zpool status
  pool: tank
 state: DEGRADED
 scrub: none requested
config:

NAME   STATE READ WRITE CKSUM
tank   DEGRADED 0 0 0
  raidz2   DEGRADED 0 0 0
ad4s1d ONLINE   0 0 0
ad6s1d ONLINE   0 0 0
ad9s1d ONLINE   0 0 0
ad17s1dONLINE   0 0 0
replacing  DEGRADED 0 0 0
  ad18s1d  UNAVAIL  0 9.62K 0  cannot open
  ad16s4d  ONLINE   0 0 0
ad20s1dONLINE   0 0 0
  raidz2   DEGRADED 0 0 0
ad4s1e ONLINE   0 0 0
ad6s1e ONLINE   0 0 0
ad17s1eONLINE   0 0 0
replacing  DEGRADED 0 0 0
  ad18s1e  UNAVAIL  0 11.2K 0  cannot open
  ad16s4e  ONLINE   0 0 0
ad20s1eONLINE   0 0 0

errors: No known data errors

It looks like the replace has taken in some sense, but ZFS doesn't seem to be 
resilvering as it should. Attempting to zpool offline doesn't work:
arcueid ~ # zpool offline tank ad18s1d
cannot offline ad18s1d: no valid replicas
Attempting to scrub causes a similar hang to before. Data is still readable 
(from the zvol which is the only thing actually on this filesystem), although 
slowly.

What should I do to recover this / trigger a proper replace of the failed 
partitions?

Many thanks,
Michael
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can you recover a pool if you lose the zil (b134+)

2010-05-17 Thread Richard Skelton
Hi Geoff,
I also tested a ram disk as a zil and found I could recover the pool:-
ramdiskadm -a zil 1g
zpool create -f tank c1t3d0 c1t4d0 log /dev/ramdisk/zil
zpool status tank

reboot

zpool status tank
ramdiskadm -a zil 1g
zpool replace -f tank /dev/ramdisk/zil
zpool status tank


Cheers
Richard.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] I/O statistics for each file system

2010-05-17 Thread eXeC001er
Hi.

I known that i can view statistics for the pool (zpool iostat).
I want to view statistics for each file system on pool. Is it possible?

Thank.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O statistics for each file system

2010-05-17 Thread Darren J Moffat

On 17/05/2010 12:41, eXeC001er wrote:

I known that i can view statistics for the pool (zpool iostat).
I want to view statistics for each file system on pool. Is it possible?


See fsstat(1M)

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O statistics for each file system

2010-05-17 Thread eXeC001er
good.
but this utility is used to view statistics for mounted FS.
How can i view statistics for iSCSI shared FS?

Thanks.

2010/5/17 Darren J Moffat 

> On 17/05/2010 12:41, eXeC001er wrote:
>
>> I known that i can view statistics for the pool (zpool iostat).
>> I want to view statistics for each file system on pool. Is it possible?
>>
>
> See fsstat(1M)
>
> --
> Darren J Moffat
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O statistics for each file system

2010-05-17 Thread Henrik Johansen

Hi,

On 05/17/10 01:57 PM, eXeC001er wrote:

good.
but this utility is used to view statistics for mounted FS.
How can i view statistics for iSCSI shared FS?


fsstat(1M) relies on certain kstat counters for it's operation -
last I checked I/O against zvols does not update those counters.

It your are using newer builds and COMSTAR you can use the stmf kstat 
counters to get I/O details per target and per LUN.



Thanks.

2010/5/17 Darren J Moffat mailto:darr...@opensolaris.org>>

On 17/05/2010 12:41, eXeC001er wrote:

I known that i can view statistics for the pool (zpool iostat).
I want to view statistics for each file system on pool. Is it
possible?


See fsstat(1M)

--
Darren J Moffat





--
Med venlig hilsen / Best Regards

Henrik Johansen
hen...@scannet.dk
Tlf. 75 53 35 00

ScanNet Group
A/S ScanNet
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Odd dump volume panic

2010-05-17 Thread Robert Milkowski

On 12/05/2010 22:19, Ian Collins wrote:

On 05/13/10 03:27 AM, Lori Alt wrote:

On 05/12/10 04:29 AM, Ian Collins wrote:
I just tried moving a dump volume form rpool into another pool so I 
used zfs send/receive to copy the volume (to keep some older dumps) 
then ran dumpadm -d to use the new location.  This caused a panic.  
Nothing ended up in messages and needless to say, there isn't a dump!


Creating a new volume and using that worked fine.

This was on Solaris 10 update 8.

Has anyone else seen anything like this?



The fact that a panic occurred is some kind of bug, but I'm also not 
surprised that this didn't work.  Dump volumes have specialized 
behavior and characteristics and using send/receive to move them (or 
any other way to move them) is probably not going to work.  You need 
to extract the dump from the dump zvol using savecore and then move 
the resulting file.


I'm surprised.  I thought the volume used for dump is just a normal 
zvol or other block device.  I didn't realise there was any 
relationship between a zvol and its contents.


One odd think I did notice was the device size was reported 
differently on the new pool:


zfs get all space/dump
NAMEPROPERTY  VALUE  SOURCE
space/dump  type  volume -
space/dump  creation  Wed May 12 20:56 2010  -
space/dump  used  12.9G  -
space/dump  available 201G   -
space/dump  referenced12.9G  -
space/dump  compressratio 1.01x  -
space/dump  reservation   none   default
space/dump  volsize   16G-
space/dump  volblocksize  128K   -
space/dump  checksum  on default
space/dump  compression   on inherited 
from space

space/dump  readonly  offdefault
space/dump  shareiscsioffdefault
space/dump  copies1  default
space/dump  refreservationnone   default
space/dump  primarycache  alldefault
space/dump  secondarycachealldefault
space/dump  usedbysnapshots   0  -
space/dump  usedbydataset 12.9G  -
space/dump  usedbychildren0  -
space/dump  usedbyrefreservation  0  -

zfs get all rpool/dump
NAMEPROPERTY  VALUE  SOURCE
rpool/dump  type  volume -
rpool/dump  creation  Thu Jun 25 19:40 2009  -
rpool/dump  used  16.0G  -
rpool/dump  available 10.4G  -
rpool/dump  referenced16K-
rpool/dump  compressratio 1.00x  -
rpool/dump  reservation   none   default
rpool/dump  volsize   16G-
rpool/dump  volblocksize  8K -
rpool/dump  checksum  offlocal
rpool/dump  compression   offlocal
rpool/dump  readonly  offdefault
rpool/dump  shareiscsioffdefault
rpool/dump  copies1  default
rpool/dump  refreservationnone   default
rpool/dump  primarycache  alldefault
rpool/dump  secondarycachealldefault



zvol used as a dump device has some constraints in regards to its 
settings like checksum, compressions, etc. For more details see:
   
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/zvol.c#1683


See that space/dump has checksums turned on, compression turned on, etc. 
while rpool/dump doesn't.


Additionally all blocks need to be pre-allocated 
(http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/zvol.c#1785) 
- but zfs send|recv should replicate it I think.


--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O statistics for each file system

2010-05-17 Thread eXeC001er
perfect!

I found info about kstat for Perl.

Where can I find the meaning of each field?

r...@atom:~# kstat stmf:0:stmf_lu_io_ff00d1c2a8f8
1274100947
module: stmfinstance: 0
name:   stmf_lu_io_ff00d1c2a8f8 class:io
crtime  2333040.65018394
nread   9954962
nwritten5780992
rcnt0
reads   599
rlastupdate 2334856.48028583
rlentime2.792307252
rtime   2.453258966
snaptime2335022.3396771
wcnt0
wlastupdate 2334856.43951113
wlentime0.103487047
writes  510
wtime   0.069508209

2010/5/17 Henrik Johansen 

> Hi,
>
>
> On 05/17/10 01:57 PM, eXeC001er wrote:
>
>> good.
>> but this utility is used to view statistics for mounted FS.
>> How can i view statistics for iSCSI shared FS?
>>
>
> fsstat(1M) relies on certain kstat counters for it's operation -
> last I checked I/O against zvols does not update those counters.
>
> It your are using newer builds and COMSTAR you can use the stmf kstat
> counters to get I/O details per target and per LUN.
>
>  Thanks.
>>
>> 2010/5/17 Darren J Moffat > >
>>
>>
>>On 17/05/2010 12:41, eXeC001er wrote:
>>
>>I known that i can view statistics for the pool (zpool iostat).
>>I want to view statistics for each file system on pool. Is it
>>possible?
>>
>>
>>See fsstat(1M)
>>
>>--
>>Darren J Moffat
>>
>>
>>
>
> --
> Med venlig hilsen / Best Regards
>
> Henrik Johansen
> hen...@scannet.dk
> Tlf. 75 53 35 00
>
> ScanNet Group
> A/S ScanNet
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O statistics for each file system

2010-05-17 Thread Henrik Johansen

On 05/17/10 03:05 PM, eXeC001er wrote:

perfect!

I found info about kstat for Perl.

Where can I find the meaning of each field?


Most of them can be found here under the section "I/O kstat" :

http://docs.sun.com/app/docs/doc/819-2246/kstat-3kstat?a=view



r...@atom:~# kstat stmf:0:stmf_lu_io_ff00d1c2a8f8
1274100947
module: stmfinstance: 0
name:   stmf_lu_io_ff00d1c2a8f8 class:io
 crtime  2333040.65018394
 nread   9954962
 nwritten5780992
 rcnt0
 reads   599
 rlastupdate 2334856.48028583
 rlentime2.792307252
 rtime   2.453258966
 snaptime2335022.3396771
 wcnt0
 wlastupdate 2334856.43951113
 wlentime0.103487047
 writes  510
 wtime   0.069508209

2010/5/17 Henrik Johansen mailto:hen...@scannet.dk>>

Hi,


On 05/17/10 01:57 PM, eXeC001er wrote:

good.
but this utility is used to view statistics for mounted FS.
How can i view statistics for iSCSI shared FS?


fsstat(1M) relies on certain kstat counters for it's operation -
last I checked I/O against zvols does not update those counters.

It your are using newer builds and COMSTAR you can use the stmf
kstat counters to get I/O details per target and per LUN.

Thanks.

2010/5/17 Darren J Moffat mailto:darr...@opensolaris.org>
>>


On 17/05/2010 12:41, eXeC001er wrote:

I known that i can view statistics for the pool (zpool
iostat).
I want to view statistics for each file system on pool.
Is it
possible?


See fsstat(1M)

--
Darren J Moffat




--
Med venlig hilsen / Best Regards

Henrik Johansen
hen...@scannet.dk 
Tlf. 75 53 35 00

ScanNet Group
A/S ScanNet
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org 
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss





--
Med venlig hilsen / Best Regards

Henrik Johansen
hen...@scannet.dk
Tlf. 75 53 35 00

ScanNet Group
A/S ScanNet
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O statistics for each file system

2010-05-17 Thread eXeC001er
Good! I found all the necessary information.
Thanks.

2010/5/17 Henrik Johansen 

> On 05/17/10 03:05 PM, eXeC001er wrote:
>
>> perfect!
>>
>> I found info about kstat for Perl.
>>
>> Where can I find the meaning of each field?
>>
>
> Most of them can be found here under the section "I/O kstat" :
>
> http://docs.sun.com/app/docs/doc/819-2246/kstat-3kstat?a=view
>
>
>  r...@atom:~# kstat stmf:0:stmf_lu_io_ff00d1c2a8f8
>> 1274100947
>> module: stmfinstance: 0
>> name:   stmf_lu_io_ff00d1c2a8f8 class:io
>> crtime  2333040.65018394
>> nread   9954962
>> nwritten5780992
>> rcnt0
>> reads   599
>> rlastupdate 2334856.48028583
>> rlentime2.792307252
>> rtime   2.453258966
>> snaptime2335022.3396771
>> wcnt0
>> wlastupdate 2334856.43951113
>> wlentime0.103487047
>> writes  510
>> wtime   0.069508209
>>
>> 2010/5/17 Henrik Johansen mailto:hen...@scannet.dk>>
>>
>>
>>Hi,
>>
>>
>>On 05/17/10 01:57 PM, eXeC001er wrote:
>>
>>good.
>>but this utility is used to view statistics for mounted FS.
>>How can i view statistics for iSCSI shared FS?
>>
>>
>>fsstat(1M) relies on certain kstat counters for it's operation -
>>last I checked I/O against zvols does not update those counters.
>>
>>It your are using newer builds and COMSTAR you can use the stmf
>>kstat counters to get I/O details per target and per LUN.
>>
>>Thanks.
>>
>>2010/5/17 Darren J Moffat >
>>>>
>>
>>
>>
>>On 17/05/2010 12:41, eXeC001er wrote:
>>
>>I known that i can view statistics for the pool (zpool
>>iostat).
>>I want to view statistics for each file system on pool.
>>Is it
>>possible?
>>
>>
>>See fsstat(1M)
>>
>>--
>>Darren J Moffat
>>
>>
>>
>>
>>--
>>Med venlig hilsen / Best Regards
>>
>>Henrik Johansen
>>hen...@scannet.dk 
>>
>>Tlf. 75 53 35 00
>>
>>ScanNet Group
>>A/S ScanNet
>>___
>>zfs-discuss mailing list
>>zfs-discuss@opensolaris.org 
>>
>>http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
>>
>>
>
> --
> Med venlig hilsen / Best Regards
>
> Henrik Johansen
> hen...@scannet.dk
> Tlf. 75 53 35 00
>
> ScanNet Group
> A/S ScanNet
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Strategies for expanding storage area of home storage-server

2010-05-17 Thread Andreas Gunnarsson
Hello.
I've got a home-storage-server setup with Opensolaris (currently dev build 134) 
that is quickly running out of storage space, and I'm looking through what kind 
of options I have for expanding it.

I currently have my "storage-pool" in a 4x 1TB drive setup in RAIDZ1, and have 
room for 8-9 more drives in the case/controllers.
Preferably I'd like to change it all to a RAIDZ2 with 12 drives, and 1 
hotspare, but that would require me to transfer out all the data to an external 
storage, and then recreating a new pool, which would require me buying some 
additional external storage that will not be used after I'm done with the 
transfer.

I could also add 2 more 4 drive vdevs to the current pool, but then I would 
have 3 RAIDZ1 vdevs striped, and I'm not entirely sure that I'm comfortable 
with that level of protection on the data.

Another version would be creating a 6 drive RAIDZ2 pool, moving the data to 
that one and the destroying the old pool and adding another 6 drive vdev to the 
new pool (striped).

So the question is what would you recommend for growing my storage space:
1. Buying extra hardware to copy the data to, and rebuild the pool as a 12 
drive RAIDZ2.
2. Move data to a 6 drive RAIDZ2 and then destroy the old pool and stripe an 
additional RAIDZ2 vdevs.
3. Stripe 2 additional RAIDZ1 4 drive vdevs.
4. Something else.

Easiest would of course be adding new 4-drive vdevs to the existing pool, but 
I'm unsure how much I'd be able to trust more than 1 drive not failing in that 
setup. Am I worried needlessly? (Imagine 10% or so of the data as 
vacation-footage or something like that and you'll be rather close to how I 
value the data. I have some backups of the most important stuff, but I know 
myself good enough to know I will not backup everything I should as good as I 
should). I guess a hotspare on top of the stripe would give some extra buffer 
if the drives just resilver fast enough in case of a failure, so that would 
make it a bit more "safe".

The hardware setup if anyone's interested is opensolaris running in a VM on an  
ESXi-server, and storage-pool harddrives are place in a iSCSI target that I 
connect to using ESXi's iSCSI initiator (using MPIO) and attached to the VM as 
RAW-devices.

Hope someone has some ideas or opinions.

Regards
Andreas Gunnarsson
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can you recover a pool if you lose the zil (b134+)

2010-05-17 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Geoff Nordli
> 
> I was messing around with a ramdisk on a pool and I forgot to remove it
> before I shut down the server.  Now I am not able to mount the pool.  I
> am
> not concerned with the data in this pool, but I would like to try to
> figure
> out how to recover it.
> 
> I am running Nexenta 3.0 NCP (b134+).

Try this:
zpool upgrade
By default, it will just tell you the current versions of zpools, without
actually doing any upgrades.  If your zpool is 19 or greater, then the loss
of a ZIL is not fatal to the pool.  You should be able to "zpool import" and
then you'll see a message about "zpool import -F"

If you have zpool < 19, then it's lost.

BTW, just to make sure you know ... Having a ZIL in RAM makes no sense
whatsoever, except for academic purposes.  For a system in actual usage, you
should either implement nonvolatile ZIL device, or disable ZIL (to be used
with caution.)


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is the J4200 SAS array suitable for Sun Cluster?

2010-05-17 Thread Gary Mills
On Sun, May 16, 2010 at 01:14:24PM -0700, Charles Hedrick wrote:
> We use this configuration. It works fine. However I don't know
> enough about the details to answer all of your questions.
> 
> The disks are accessible from both systems at the same time. Of
> course with ZFS you had better not actually use them from both
> systems.

That's what I wanted to know.  I'm not familiar with SAS fabrics, so
it's good to know that they operate similarly to multi-initiator SCSI
in a cluster.

> Actually, let me be clear about what we do. We have two J4200's and
> one J4400. One J4200 uses SAS disks, the others SATA. The two with
> SATA disks are used in Sun cluster configurations as NFS
> servers. They fail over just fine, losing no state. The one with SAS
> is not used with Sun Cluster. Rather, it's a Mysql server with two
> systems, one of them as a hot spare. (It also acts as a mysql slave
> server, but it uses different storage for that.) That means that our
> actual failover experience is with the SATA configuration. I will
> say from experience that in the SAS configuration both systems see
> the disks at the same time. I even managed to get ZFS to mount the
> same pool from both systems, which shouldn't be possible. Behavior
> was very strange until we realized what was going on.

Our situation is that we only need a small amount of shared storate
in the cluster.  It's intended for high-availability of core services,
such as DNS and NIS, rather than as a NAS server.

> I get the impression that they have special hardware in the SATA
> version that simulates SAS dual interface drives. That's what lets
> you use SATA drives in a two-node configuration. There's also some
> additional software setup for that configuration.

That would be the SATA interposer that does that.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can you recover a pool if you lose the zil (b134+)

2010-05-17 Thread Victor Latushkin

On May 17, 2010, at 5:29 PM, Edward Ned Harvey wrote:

>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> boun...@opensolaris.org] On Behalf Of Geoff Nordli
>> 
>> I was messing around with a ramdisk on a pool and I forgot to remove it
>> before I shut down the server.  Now I am not able to mount the pool.  I
>> am
>> not concerned with the data in this pool, but I would like to try to
>> figure
>> out how to recover it.
>> 
>> I am running Nexenta 3.0 NCP (b134+).
> 
> Try this:
>   zpool upgrade
> By default, it will just tell you the current versions of zpools, without
> actually doing any upgrades.  If your zpool is 19 or greater, then the loss
> of a ZIL is not fatal to the pool.  You should be able to "zpool import" and
> then you'll see a message about "zpool import -F"
> 
> If you have zpool < 19, then it's lost.

If you have zpool.cache, then it is not lost, if you  do not have it - there's 
still a chance, as ZIL device details are reflected in the in-pool config, and 
it may be possible to extract config copy out of the pool.


> BTW, just to make sure you know ... Having a ZIL in RAM makes no sense
> whatsoever, except for academic purposes.  For a system in actual usage, you
> should either implement nonvolatile ZIL device, or disable ZIL (to be used
> with caution.)

Be aware that zil_disable has been removed recently and replaced with 'sync' 
dataset property.

regards
victor

> 
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can you recover a pool if you lose the zil (b134+)

2010-05-17 Thread Geoff Nordli


>-Original Message-
>From: Edward Ned Harvey [mailto:solar...@nedharvey.com]
>Sent: Monday, May 17, 2010 6:29 AM
>>
>> I was messing around with a ramdisk on a pool and I forgot to remove
>> it before I shut down the server.  Now I am not able to mount the
>> pool.  I am not concerned with the data in this pool, but I would like
>> to try to figure out how to recover it.
>>
>> I am running Nexenta 3.0 NCP (b134+).
>
>Try this:
>   zpool upgrade
>By default, it will just tell you the current versions of zpools, without
actually
>doing any upgrades.  If your zpool is 19 or greater, then the loss of a ZIL
is not
>fatal to the pool.  You should be able to "zpool import" and then you'll
see a
>message about "zpool import -F"
>
>If you have zpool < 19, then it's lost.
>
>BTW, just to make sure you know ... Having a ZIL in RAM makes no sense
>whatsoever, except for academic purposes.  For a system in actual usage,
you
>should either implement nonvolatile ZIL device, or disable ZIL (to be used
with
>caution.)
>

Thanks Edward.

The syspool is sitting at level 18 so I assume the old pool is toast.  I was
more curious why nothing was working because there are reports that you can
do it, but it wasn't working for me.  

This system isn't in production, I was just testing to see if the zil was
being used or not.  

Have a great day!

Geoff 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Strategies for expanding storage area of home storage-server

2010-05-17 Thread Freddie Cash
On Mon, May 17, 2010 at 6:25 AM, Andreas Gunnarsson wrote:

> I've got a home-storage-server setup with Opensolaris (currently dev build
> 134) that is quickly running out of storage space, and I'm looking through
> what kind of options I have for expanding it.
>
> I currently have my "storage-pool" in a 4x 1TB drive setup in RAIDZ1, and
> have room for 8-9 more drives in the case/controllers.
> Preferably I'd like to change it all to a RAIDZ2 with 12 drives, and 1
> hotspare, but that would require me to transfer out all the data to an
> external storage, and then recreating a new pool, which would require me
> buying some additional external storage that will not be used after I'm done
> with the transfer.
>
> I could also add 2 more 4 drive vdevs to the current pool, but then I would
> have 3 RAIDZ1 vdevs striped, and I'm not entirely sure that I'm comfortable
> with that level of protection on the data.
>
> Another version would be creating a 6 drive RAIDZ2 pool, moving the data to
> that one and the destroying the old pool and adding another 6 drive vdev to
> the new pool (striped).
>
> So the question is what would you recommend for growing my storage space:
> 1. Buying extra hardware to copy the data to, and rebuild the pool as a 12
> drive RAIDZ2.
> 2. Move data to a 6 drive RAIDZ2 and then destroy the old pool and stripe
> an additional RAIDZ2 vdevs.
> 3. Stripe 2 additional RAIDZ1 4 drive vdevs.
> 4. Something else.


I'd go with option 2.

Create a 6-drive raidz2 vdev in a separate pool.  Migrate the data from the
old pool to the new pool.  Destroy the old pool.  Create a second 6-drive
raidz2 vdev in the new pool.  Voila!  You'll have a lot of extra space, be
able to withstand up to 4 drive failures (2 per vdev), and it should be
faster as well (even with the added overhead of raidz2).

Option 3 would give the best performance, but you don't have much leeway in
terms of resilver time if using 1 TB+ drives, and if a second drive fails
while the first is resilvering ...

Option 1 would be horrible in terms of performance.  Especially resilver
times, as you'll be thrashing 12 drives.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Using WD Green drives?

2010-05-17 Thread Dan Pritts
On Thu, May 13, 2010 at 06:09:55PM +0200, Roy Sigurd Karlsbakk wrote:
> 1. even though they're 5900, not 7200, benchmarks I've seen show they are 
> quite good 

Minor correction, they are 5400rpm.  Seagate makes some 5900rpm drives.

The "green" drives have reasonable raw throughput rate, due to the
extremely high platter density nowadays.  however, due to their low
spin speed, their average-access time is significantly slower than
7200rpm drives.

For bulk archive data containing large files, this is less of a concern.

Regarding slow reslivering times, in the absence of other disk activity,
I think that should really be limited by the throughput rate, not the
relatively slow random i/o performance...again assuming large files
(and low fragmentation, which if the archive is write-and-never-delete
is what i'd expect).

One test i saw suggests 60MB/sec avg throughput on the 2TB drives.
That works out to 9.25 hours to read the entire 2TB.  At a conservative
50MB/sec it's 11 hours.  This assumes that you have enough I/O bandwidth
and CPU on the system to saturate all your disks.

if there's other disk activity during a resilver, though, it turns into
random i/o.  Which is slow on these drives.

danno
--
Dan Pritts, Sr. Systems Engineer
Internet2
office: +1-734-352-4953 | mobile: +1-734-834-7224

Visit our website: www.internet2.edu
Follow us on Twitter: www.twitter.com/internet2
Become a Fan on Facebook: www.internet2.edu/facebook
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Using WD Green drives?

2010-05-17 Thread Casper . Dik

>On Thu, May 13, 2010 at 06:09:55PM +0200, Roy Sigurd Karlsbakk wrote:
>> 1. even though they're 5900, not 7200, benchmarks I've seen show they are 
>> quite good 
>
>Minor correction, they are 5400rpm.  Seagate makes some 5900rpm drives.
>
>The "green" drives have reasonable raw throughput rate, due to the
>extremely high platter density nowadays.  however, due to their low
>spin speed, their average-access time is significantly slower than
>7200rpm drives.
>
>For bulk archive data containing large files, this is less of a concern.
>
>Regarding slow reslivering times, in the absence of other disk activity,
>I think that should really be limited by the throughput rate, not the
>relatively slow random i/o performance...again assuming large files
>(and low fragmentation, which if the archive is write-and-never-delete
>is what i'd expect).

My experience is that they resilver fairly quickly and scrbs aren't slow
either. (300GB in 2hrs)

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Using WD Green drives?

2010-05-17 Thread Tomas Ögren
On 17 May, 2010 - Dan Pritts sent me these 1,6K bytes:

> On Thu, May 13, 2010 at 06:09:55PM +0200, Roy Sigurd Karlsbakk wrote:
> > 1. even though they're 5900, not 7200, benchmarks I've seen show they are 
> > quite good 
> 
> Minor correction, they are 5400rpm.  Seagate makes some 5900rpm drives.
> 
> The "green" drives have reasonable raw throughput rate, due to the
> extremely high platter density nowadays.  however, due to their low
> spin speed, their average-access time is significantly slower than
> 7200rpm drives.
> 
> For bulk archive data containing large files, this is less of a concern.
> 
> Regarding slow reslivering times, in the absence of other disk activity,
> I think that should really be limited by the throughput rate, not the
> relatively slow random i/o performance...again assuming large files
> (and low fragmentation, which if the archive is write-and-never-delete
> is what i'd expect).
> 
> One test i saw suggests 60MB/sec avg throughput on the 2TB drives.
> That works out to 9.25 hours to read the entire 2TB.  At a conservative
> 50MB/sec it's 11 hours.  This assumes that you have enough I/O bandwidth
> and CPU on the system to saturate all your disks.
> 
> if there's other disk activity during a resilver, though, it turns into
> random i/o.  Which is slow on these drives.

Resilver does a whole lot of random io itself, not bulk reads.. It reads
the filesystem tree, not "block 0, block 1, block 2..". You won't get
60MB/s sustained, not even close.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Using WD Green drives?

2010-05-17 Thread Freddie Cash
On Mon, May 17, 2010 at 9:25 AM, Tomas Ögren  wrote:

> On 17 May, 2010 - Dan Pritts sent me these 1,6K bytes:
>
> > On Thu, May 13, 2010 at 06:09:55PM +0200, Roy Sigurd Karlsbakk wrote:
> > > 1. even though they're 5900, not 7200, benchmarks I've seen show they
> are quite good
> >
> > Minor correction, they are 5400rpm.  Seagate makes some 5900rpm drives.
> >
> > The "green" drives have reasonable raw throughput rate, due to the
> > extremely high platter density nowadays.  however, due to their low
> > spin speed, their average-access time is significantly slower than
> > 7200rpm drives.
> >
> > For bulk archive data containing large files, this is less of a concern.
> >
> > Regarding slow reslivering times, in the absence of other disk activity,
> > I think that should really be limited by the throughput rate, not the
> > relatively slow random i/o performance...again assuming large files
> > (and low fragmentation, which if the archive is write-and-never-delete
> > is what i'd expect).
> >
> > One test i saw suggests 60MB/sec avg throughput on the 2TB drives.
> > That works out to 9.25 hours to read the entire 2TB.  At a conservative
> > 50MB/sec it's 11 hours.  This assumes that you have enough I/O bandwidth
> > and CPU on the system to saturate all your disks.
> >
> > if there's other disk activity during a resilver, though, it turns into
> > random i/o.  Which is slow on these drives.
>
> Resilver does a whole lot of random io itself, not bulk reads.. It reads
> the filesystem tree, not "block 0, block 1, block 2..". You won't get
> 60MB/s sustained, not even close.
>
> Resilver time for a 1.5 TB WD Green drive, with wdidle3 setting "disabled",
in an 8-drive raidz2 vdev, is over 65 hours, with ~500 GB of data per drive.
 We just replaced 8 WD 500 GB RE Black drives with 8 1.5 TB WD Green drives,
not realising just how horrible of a drive these are.  :(  So much for the
$100 CDN bargain price.

Resilver time for a 1.5 TB Seagate 7200.11 drive, in an 8-drive raidz2 vdev,
is about 35 hours, with ~ 500 GB of data per drive.  We just replaced 8 WD
Black 500 GB drives and Seagate 7200.11 500 GB drives with 8 1.5 TB Seagate
7200.11.  Much nicer drives, and way better performance than the WD Greens.

Both servers are identical hardware (motherboard, CPU, RAM, RAID
controllers, etc), both are using ZFSv14.  The first is 64-bit FreeBSD
7.3-RELEASE, the second is FreeBSD 8-STABLE.

For a home media server, the WD Greens may be okay.

For anything else, they're crap.  Plain and simple.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Using WD Green drives?

2010-05-17 Thread Dan Pritts
On Mon, May 17, 2010 at 06:25:18PM +0200, Tomas Ögren wrote:
> Resilver does a whole lot of random io itself, not bulk reads.. It reads
> the filesystem tree, not "block 0, block 1, block 2..". You won't get
> 60MB/s sustained, not even close.

Even with large, unfragmented files?  

danno
--
Dan Pritts, Sr. Systems Engineer
Internet2
office: +1-734-352-4953 | mobile: +1-734-834-7224

Visit our website: www.internet2.edu
Follow us on Twitter: www.twitter.com/internet2
Become a Fan on Facebook: www.internet2.edu/facebook
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?

2010-05-17 Thread Thomas Burgess
hey, when i do this single user boot, is there anyway to capture what pops
on the screen?  It's a LOT of stuff.


anyways, it seems to work fine when i do singleuser -srv

cpustat -h lists exactly what you said it should plus a lot more (though the
"more" is above, so like you said, it shows what it should show at the
bottom)


I'll capture all that later and post it.


On Sat, May 15, 2010 at 8:35 PM, Dennis Clarke wrote:

> - Original Message -
> From: Thomas Burgess 
> Date: Saturday, May 15, 2010 8:09 pm
> Subject: Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?
> To: Orvar Korvar 
> Cc: zfs-discuss@opensolaris.org
>
>
> > Well i just wanted to let everyone know that preliminary results are
> good.
> >  The livecd booted, all important things seem to be recognized. It
> > sees all
> > 16 gb of ram i installed and all 8 cores of my opteron 6128
> >
> > The only real shocker is how loud the norco RPC-4220 fans are (i have
> > another machine with a norco 4020 case so i assumed the fans would be
> > similar.this was a BAD assumption)  This thing sounds like a hair
> > dryer
> >
> > Anyways, I'm running the install now so we'll see how that goes. It
> > did take
> > about 10 minutes to "find a disk" durring the installer, but if i
> remember
> > right, this happened on other machines as well.
> >
>
> Once you have the install done could you post ( somewhere ) what you see
> during a single user mode boot with options -srv ?
>
> I would like to see all the gory details.
>
> Also, could you run "cpustat -h" ?
>
> At the bottom, according to usr/src/uts/intel/pcbe/opteron_pcbe.c you shoud
> see :
>
> See "BIOS and Kernel Developer's Guide (BKDG) For AMD Family 10h
> Processors" (AMD publication 31116)
>
> The following registers should be listed :
>
>  #defineAMD_FAMILY_10h_generic_events
> \
>{ "PAPI_tlb_dm","DC_dtlb_L1_miss_L2_miss",  0x7 },  \
>{ "PAPI_tlb_im","IC_itlb_L1_miss_L2_miss",  0x3 },  \
>{ "PAPI_l3_dcr","L3_read_req",  0xf1 }, \
>{ "PAPI_l3_icr","L3_read_req",  0xf2 }, \
>{ "PAPI_l3_tcr","L3_read_req",  0xf7 }, \
>{ "PAPI_l3_stm","L3_miss",  0xf4 }, \
>{ "PAPI_l3_ldm","L3_miss",  0xf3 }, \
>{ "PAPI_l3_tcm","L3_miss",  0xf7 }
>
>
> You should NOT see anything like this :
>
> r...@aequitas:/root# uname -a
> SunOS aequitas 5.11 snv_139 i86pc i386 i86pc Solaris
> r...@aequitas:/root# cpustat -h
> cpustat: cannot access performance counters - Operation not applicable
>
>
> ... as well as psrinfo -pv please ?
>
>
> When I get my HP Proliant with the 6174 procs I'll be sure to post whatever
> I see.
>
> Dennis
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?

2010-05-17 Thread Thomas Burgess
psrinfo -pv shows:


The physical processor has 8 virtual processors (0-7)
x86  (AuthenticAMD 100F91 family 16 model 9 step 1 clock 200 MHz)
   AMD Opteron(tm) Processor 6128   [  Socket: G34 ]




On Sat, May 15, 2010 at 8:35 PM, Dennis Clarke wrote:

> - Original Message -
> From: Thomas Burgess 
> Date: Saturday, May 15, 2010 8:09 pm
> Subject: Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?
> To: Orvar Korvar 
> Cc: zfs-discuss@opensolaris.org
>
>
> > Well i just wanted to let everyone know that preliminary results are
> good.
> >  The livecd booted, all important things seem to be recognized. It
> > sees all
> > 16 gb of ram i installed and all 8 cores of my opteron 6128
> >
> > The only real shocker is how loud the norco RPC-4220 fans are (i have
> > another machine with a norco 4020 case so i assumed the fans would be
> > similar.this was a BAD assumption)  This thing sounds like a hair
> > dryer
> >
> > Anyways, I'm running the install now so we'll see how that goes. It
> > did take
> > about 10 minutes to "find a disk" durring the installer, but if i
> remember
> > right, this happened on other machines as well.
> >
>
> Once you have the install done could you post ( somewhere ) what you see
> during a single user mode boot with options -srv ?
>
> I would like to see all the gory details.
>
> Also, could you run "cpustat -h" ?
>
> At the bottom, according to usr/src/uts/intel/pcbe/opteron_pcbe.c you shoud
> see :
>
> See "BIOS and Kernel Developer's Guide (BKDG) For AMD Family 10h
> Processors" (AMD publication 31116)
>
> The following registers should be listed :
>
>  #defineAMD_FAMILY_10h_generic_events
> \
>{ "PAPI_tlb_dm","DC_dtlb_L1_miss_L2_miss",  0x7 },  \
>{ "PAPI_tlb_im","IC_itlb_L1_miss_L2_miss",  0x3 },  \
>{ "PAPI_l3_dcr","L3_read_req",  0xf1 }, \
>{ "PAPI_l3_icr","L3_read_req",  0xf2 }, \
>{ "PAPI_l3_tcr","L3_read_req",  0xf7 }, \
>{ "PAPI_l3_stm","L3_miss",  0xf4 }, \
>{ "PAPI_l3_ldm","L3_miss",  0xf3 }, \
>{ "PAPI_l3_tcm","L3_miss",  0xf7 }
>
>
> You should NOT see anything like this :
>
> r...@aequitas:/root# uname -a
> SunOS aequitas 5.11 snv_139 i86pc i386 i86pc Solaris
> r...@aequitas:/root# cpustat -h
> cpustat: cannot access performance counters - Operation not applicable
>
>
> ... as well as psrinfo -pv please ?
>
>
> When I get my HP Proliant with the 6174 procs I'll be sure to post whatever
> I see.
>
> Dennis
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?

2010-05-17 Thread Thomas Burgess
no.it doesn't.  The only sata ports that show up are the ones connected
to the backpane via the reverse breakout sas cableand they show as
 emptyso i'm thinking that opensolaris isn't working with the chipset
sata.

In the bios i can select from:

Native IDE
AMD_AHCI
RAID
Legacy IDE


I have it set to AMD_AHCIbut my board also has an IDE slot which i was
using for the CDROM drive (this is what i used to load opensolaris in the
first place)

I also have an option called  "Sate IDE combined mode"

I think this may be my problem...i had this enabled, because i thought i
needed it in order to use both sata and idei think now it's something
else.


I'm going to try to boot without it on, if it doesn't work, i'll try to
reinstall with it disabled.



On Sun, May 16, 2010 at 8:18 PM, Ian Collins  wrote:

> On 05/17/10 12:08 PM, Thomas Burgess wrote:
>
>> well, i haven't had a lot of time to work with this...but i'm having
>> trouble getting the onboard sata to work in anything but NATIVE IDE mode.
>>
>>
>> I'm not sure exactly what the problem isi'm wondering if i bought the
>> wrong cable (i have a norco 4220 case so the drives connect via a sas
>> sff-8087 on the backpane)
>>
>> I thought this required a "reverse breakout cable" but maybe i was
>> wrongthis is the first time i've worked with sas
>>
>> on the otherhand, I was able to flash my intel Intel SASUC8I cards with
>> the LSI SAS3081E IT firmware from the LSI site.  These seem to work fine.  I
>> think i'm just going to order a 3rd card and put it in the pci-e x4 slot.  I
>> don't want 16 drives running as sata and 4 running in IDE mode.Is there
>> any way i can tell if the drive i installed opensolaris to is in IDE or SATA
>> mode?
>>
>>  Does it show up in cfgadm?
>
> --
> Ian.
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?

2010-05-17 Thread Brandon High
On Mon, May 17, 2010 at 12:51 PM, Thomas Burgess  wrote:
> In the bios i can select from:
> Native IDE
> AMD_AHCI

This is probably what you want. AHCI is supposed to be chipset agnostic.

> I also have an option called  "Sate IDE combined mode"

See if there's anything in the docs about what this actually does. You
might need it to use the PATA port, but it could be what's messing
things up. If you can't use the cdrom, maybe install from a thumb
drive or usb crdrom. (My ASUS M2N-LR board refuses to boot from a
thumb drive. Likewise with a friend's Supermicro Intel board. Both
work fine from a usb cdrom.)

> I think this may be my problem...i had this enabled, because i thought i
> needed it in order to use both sata and idei think now it's something
> else.

I think so. It makes the first 4 ports look like IDE drives (two
channels, two drives per channel) and the remaining BIOS RAID or AHCI.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mirror resilver @500k/s

2010-05-17 Thread Oliver Seidel
Hello Everybody,

thank you for your support.  I have been able to find a sustained 50-70mb 
resilvering with the "iostat -x 10" command.  On one out of 3 discs.  The other 
two discs are now on their way back to the vendor and I hope to be able to 
report better success when I get them back.

Thanks again,

Oliver
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?

2010-05-17 Thread Thomas Burgess
ok, well this was part of the problem.

I disabled the Sata IDE combined mode and reinstalled opensolaris (i tried
to just disable it but osol wouldn't boot)


now the drive connected to the SSD DOES show up in cfgadm so it seems to be
in sata mode...but the drives connected to the reverse breakout cable still
don't show up.

On the bright side, the drives connected to my SAS cards (through the same
backpane, with a standard sff-8087 to sff-8087 cable) DO show up.


S, now i just need to figure out why these 4 drives aren't showing up.

(my case is the norco RPC-4220, i thought i'd be ok with 2 SAS cards (8 sata
ports each) and then 4 of the onboard ports using the reverse breakout
cable.something must be wrong with the cablei'll test the 2 drives
connected directly in a biti have to take everything appart to do that)

On Mon, May 17, 2010 at 4:04 PM, Brandon High  wrote:

> On Mon, May 17, 2010 at 12:51 PM, Thomas Burgess 
> wrote:
> > In the bios i can select from:
> > Native IDE
> > AMD_AHCI
>
> This is probably what you want. AHCI is supposed to be chipset agnostic.
>
> > I also have an option called  "Sate IDE combined mode"
>
> See if there's anything in the docs about what this actually does. You
> might need it to use the PATA port, but it could be what's messing
> things up. If you can't use the cdrom, maybe install from a thumb
> drive or usb crdrom. (My ASUS M2N-LR board refuses to boot from a
> thumb drive. Likewise with a friend's Supermicro Intel board. Both
> work fine from a usb cdrom.)
>
> > I think this may be my problem...i had this enabled, because i thought i
> > needed it in order to use both sata and idei think now it's something
> > else.
>
> I think so. It makes the first 4 ports look like IDE drives (two
> channels, two drives per channel) and the remaining BIOS RAID or AHCI.
>
> -B
>
> --
> Brandon High : bh...@freaks.com
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Strategies for expanding storage area of home storage-server

2010-05-17 Thread Thomas Burgess
I'd have to agree.  Option 2 is probably the best.

I recently found myself in need of more space...i had to build an entirely
new server...my first one was close to full (it has 20 1TB drives in 3
raidz2 groups 7/7/6 and i was down to 3 TB)  I ended up going with a whole
new serverwith 2TB drives this time...I considered replacing the drives
in my current server with new 2 TB drives but for the money, it made more
sense to keep that server online and build a second

That's where i am now...If i could have done what you are looking to do, it
woudl have been a lot easier

On Mon, May 17, 2010 at 11:29 AM, Freddie Cash  wrote:

> On Mon, May 17, 2010 at 6:25 AM, Andreas Gunnarsson wrote:
>
>> I've got a home-storage-server setup with Opensolaris (currently dev build
>> 134) that is quickly running out of storage space, and I'm looking through
>> what kind of options I have for expanding it.
>>
>> I currently have my "storage-pool" in a 4x 1TB drive setup in RAIDZ1, and
>> have room for 8-9 more drives in the case/controllers.
>> Preferably I'd like to change it all to a RAIDZ2 with 12 drives, and 1
>> hotspare, but that would require me to transfer out all the data to an
>> external storage, and then recreating a new pool, which would require me
>> buying some additional external storage that will not be used after I'm done
>> with the transfer.
>>
>> I could also add 2 more 4 drive vdevs to the current pool, but then I
>> would have 3 RAIDZ1 vdevs striped, and I'm not entirely sure that I'm
>> comfortable with that level of protection on the data.
>>
>> Another version would be creating a 6 drive RAIDZ2 pool, moving the data
>> to that one and the destroying the old pool and adding another 6 drive vdev
>> to the new pool (striped).
>>
>> So the question is what would you recommend for growing my storage space:
>> 1. Buying extra hardware to copy the data to, and rebuild the pool as a 12
>> drive RAIDZ2.
>> 2. Move data to a 6 drive RAIDZ2 and then destroy the old pool and stripe
>> an additional RAIDZ2 vdevs.
>> 3. Stripe 2 additional RAIDZ1 4 drive vdevs.
>> 4. Something else.
>
>
> I'd go with option 2.
>
> Create a 6-drive raidz2 vdev in a separate pool.  Migrate the data from the
> old pool to the new pool.  Destroy the old pool.  Create a second 6-drive
> raidz2 vdev in the new pool.  Voila!  You'll have a lot of extra space, be
> able to withstand up to 4 drive failures (2 per vdev), and it should be
> faster as well (even with the added overhead of raidz2).
>
> Option 3 would give the best performance, but you don't have much leeway in
> terms of resilver time if using 1 TB+ drives, and if a second drive fails
> while the first is resilvering ...
>
> Option 1 would be horrible in terms of performance.  Especially resilver
> times, as you'll be thrashing 12 drives.
>
> --
> Freddie Cash
> fjwc...@gmail.com
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Strategies for expanding storage area of home storage-server

2010-05-17 Thread Travis Tabbal
When I did a similar upgrade a while back I did #2. Create a new pool raidz2 
with 6 drives, copy the data to it, verify the data, delete the old pool, add 
old drives + some new drives to another 6 disk raidz2 in the new pool. 
Performance has been quite good, and the migration was very smooth. 

The other nice thing about this arrangement for a home user is that I now only 
need to upgrade 6 drives to get more space, rather than 12 per option #1. To be 
clear, this is my current config. 

NAME STATE READ WRITE CKSUM
raid ONLINE   0 0 0
  raidz2-0   ONLINE   0 0 0
c9t4d0   ONLINE   0 0 0
c9t5d0   ONLINE   0 0 0
c9t6d0   ONLINE   0 0 0
c9t7d0   ONLINE   0 0 0
c10t5d0  ONLINE   0 0 0
c10t4d0  ONLINE   0 0 0
  raidz2-1   ONLINE   0 0 0
c9t0d0   ONLINE   0 0 0
c9t1d0   ONLINE   0 0 0
c10t0d0  ONLINE   0 0 0
c10t1d0  ONLINE   0 0 0
c10t2d0  ONLINE   0 0 0
c10t3d0  ONLINE   0 0 0
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Strategies for expanding storage area of home storage-server

2010-05-17 Thread Andreas Gunnarsson
Thanks for the tips guys, I'll go with 2x 6drive raidz2 vdevs then.

Regards
Andreas Gunnarsson
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?

2010-05-17 Thread Dennis Clarke

>On 05-17-10, Thomas Burgess  wrote: 
>psrinfo -pv shows:
>
>The physical processor has 8 virtual processors (0-7)
>    x86  (AuthenticAMD 100F91 family 16 model 9 step 1 clock 200 MHz)
>               AMD Opteron(tm) Processor 6128   [  Socket: G34 ]
>

That's odd.

Please try this : 

# kstat -m cpu_info -c misc
module: cpu_infoinstance: 0
name:   cpu_info0   class:misc
brand   VIA Esther processor 1200MHz
cache_id0
chip_id 0
clock_MHz   1200
clog_id 0
core_id 0
cpu_typei386
crtime  3288.24125364
current_clock_Hz1199974847
current_cstate  0
family  6
fpu_typei387 compatible
implementation  x86 (CentaurHauls 6A9 family 6 model 10 
step 9 clock 1200 MHz)
model   10
ncore_per_chip  1
ncpu_per_chip   1
pg_id   -1
pkg_core_id 0
snaptime1526742.97169617
socket_type Unknown
state   on-line
state_begin 1272610247
stepping9
supported_frequencies_Hz1199974847
supported_max_cstates   0
vendor_id   CentaurHauls

You should get a LOT more data.

Dennis 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Using WD Green drives?

2010-05-17 Thread Erik Trimble
On Mon, 2010-05-17 at 12:54 -0400, Dan Pritts wrote:
> On Mon, May 17, 2010 at 06:25:18PM +0200, Tomas Ögren wrote:
> > Resilver does a whole lot of random io itself, not bulk reads.. It reads
> > the filesystem tree, not "block 0, block 1, block 2..". You won't get
> > 60MB/s sustained, not even close.
> 
> Even with large, unfragmented files?  
> 
> danno
> --
> Dan Pritts, Sr. Systems Engineer
> Internet2
> office: +1-734-352-4953 | mobile: +1-734-834-7224

Having large, unfragmented files will certainly help keep sustained
throughput.  But, also, you have to consider the amount of deletions
done on the pool.

For instance, let's say you wrote files A, B, and C one right after
another, and they're all big files.  Doing a re-silver, you'd be pretty
well off on getting reasonable throughput reading A, then B, then C,
since they're going to be contiguous on the drive (both internally, and
across the three files).  However, if you have deleted B at some time,
and say wrote a file D (where D < B in size) into B's old space, then,
well, you seek to A, read A, seek forward to C, read C, seek back to D,
etc.

Thus, you'll get good throughput for resilver on these drives pretty
much in just ONE case:  large files with NO deletions.  If you're using
them for write-once/read-many/no-delete archives, then you're OK.
Anything else is going to suck.

:-)



-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Hashing files rapidly on ZFS

2010-05-17 Thread Daniel Carosone
On Tue, May 11, 2010 at 04:15:24AM -0700, Bertrand Augereau wrote:
> Is there a O(nb_blocks_for_the_file) solution, then?
> 
> I know O(nb_blocks_for_the_file) == O(nb_bytes_in_the_file), from Mr. 
> Landau's POV, but I'm quite interested in a good constant factor.

If you were considering the hashes of each zfs block as a precomputed
value, it might be tempting to think of getting all of these and
hashing them together.  You could thereby avoiding reading file data,
and the file metadata with the hashes in, you'd have needed to read
anyway. This would seem to be appealing, eliminating seeks and cpu
work. 

However, there are some issues that make the approach basically
infeasible and unreliable for comparing the results of two otherwise
identical files.

First, you're assuming there's an easy interface to get the stored
hashes of a block, which there isn't.  Even if we ignore that for a
moment, the hashes zfs records depend on factors other than just the
file content, including the way the file has been written over time.  

The blocks of the file may not be constant size; a file that grew
slowly may have different hashes to a copy of it or one extracted
from an archive in a fast stream.  Filesystem properties, including
checksum (obvious), dedup (which implies checksum), compress (which
changes written data and can make holes), blocksize and maybe others
may be different between filesystems or even change over the time a
file has been written, and again change results and defeat
comparisons.

These things can defeat zfs's dedup too, even though it does have
access to the block level checksums.

If you're going to do an application-level dedup, you want to utilise
the advantage of being independent of these things - or even of the
underlying filesystem at all (e.g. dedup between two NAS shares).

Something similar would be useful, and much more readily achievable,
from ZFS from such an application, and many others.  Rather than a way
to compare reliably between two files for identity, I'ld liek a way to
compare identity of a single file between two points in time.  If my
application can tell quickly that the file content is unaltered since
last time I saw the file, I can avoid rehashing the content and use a
stored value. If I can achieve this result for a whole directory
tree, even better.

--
Dan.





pgp1HgRATGs5S.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Ideal SATA/SAS Controllers for ZFS

2010-05-17 Thread Marc Bevand
The LSI SAS1064E slipped through the cracks when I built the list.
This is a 4-port PCIe x8 HBA with very good Solaris (and Linux)
support. I don't remember having seen it mentionned on zfs-discuss@
before, even though many were looking for 4-port controllers. Perhaps
the fact it is priced too close to 8-port models explains why it is
relatively unnoted. That said, the wide x8 PCIe link makes it the
*cheapest* controller able to feed 300-350MB/s to at least 4 ports
concurrently. Now added to my list.

-mrb

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Using WD Green drives?

2010-05-17 Thread Pasi Kärkkäinen
On Mon, May 17, 2010 at 03:12:44PM -0700, Erik Trimble wrote:
> On Mon, 2010-05-17 at 12:54 -0400, Dan Pritts wrote:
> > On Mon, May 17, 2010 at 06:25:18PM +0200, Tomas Ögren wrote:
> > > Resilver does a whole lot of random io itself, not bulk reads.. It reads
> > > the filesystem tree, not "block 0, block 1, block 2..". You won't get
> > > 60MB/s sustained, not even close.
> > 
> > Even with large, unfragmented files?  
> > 
> > danno
> > --
> > Dan Pritts, Sr. Systems Engineer
> > Internet2
> > office: +1-734-352-4953 | mobile: +1-734-834-7224
> 
> Having large, unfragmented files will certainly help keep sustained
> throughput.  But, also, you have to consider the amount of deletions
> done on the pool.
> 
> For instance, let's say you wrote files A, B, and C one right after
> another, and they're all big files.  Doing a re-silver, you'd be pretty
> well off on getting reasonable throughput reading A, then B, then C,
> since they're going to be contiguous on the drive (both internally, and
> across the three files).  However, if you have deleted B at some time,
> and say wrote a file D (where D < B in size) into B's old space, then,
> well, you seek to A, read A, seek forward to C, read C, seek back to D,
> etc.
> 
> Thus, you'll get good throughput for resilver on these drives pretty
> much in just ONE case:  large files with NO deletions.  If you're using
> them for write-once/read-many/no-delete archives, then you're OK.
> Anything else is going to suck.
> 
> :-)
> 

So basicly if you have a lot of small files with a lot of changes
and deletions resilver is going to be really slow.

Sounds like the traditional RAID would be better/faster to rebuild in this 
case..

-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss