[zfs-discuss] Checksum errors in storage pool

2007-03-07 Thread H.-J. Schnitzer
Hi,

I am using ZFS under Solaris 10u3.

After the defect of a 3510 Raid controller, I have several storage pools
with defect objects. "zpool status -xv" prints a long list:

  DATASET  OBJECT  RANGE
  4c0c 5dd lvl=0 blkid=2
  28   b346lvl=0 blkid=9
  3b31 15d lvl=0 blkid=1
  3b31 15d lvl=0 blkid=2
  3b31 15d lvl=0 blkid=2727
  3b31 190 lvl=0 blkid=0
  ...

I know that the number in the column "OBJECT" identifies the inode number
of the affected file. 
However, I have more than 1000 filesystems  in each of the 
affected storage pools. So how do I identify the correct filesystem?
According to 
http://blogs.sun.com/erickustarz/entry/damaged_files_and_zpool_status
I have to use zdb. But I can't figure out how to use it. Can you help?

Hans Schnitzer
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS/UFS layout for 4 disk servers

2007-03-07 Thread Matt B
Thanks for responses. There is a lot there I am looking forward to digesting.
Right off the bat though I wanted to bring up something I found just before 
reading this reply as the answer to this question would automatically answer 
some other questinos

There is a ZFS best practices wiki at 
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#General_Storage_Pool_Performance_Considerations

that makes a couple points:

*Swap space - Because ZFS caches data in kernel addressable memory, the kernel 
sizes will likely be larger than with other file systems. Configure additional 
disk-based swap to account for this difference. You can use the size of 
physical memory as an upper bound to the extra amount of swap space that might 
be required. Do not use slices on the same disk for both swap space and ZFS 
file systems. Keep the swap areas separate from the ZFS file systems.

*Do not use slices for storage pools that are intended for production use.

So after reading this it seems that with only 4 disks to work with and the fact 
that UFS for root (initial install) is still required that my only option to 
conform to the best practice is to use two disks with UFS/RAid1 leaving only 
the remaining two disks for 100% ZFS. Additionally, the swap partition would 
have to go on the UFS set of disks to keep it seperate from the ZFS set of disks

If I am misinterpreting the wiki please let me know.

There are some tradeoffs here. I would prefer to use a 4 way mirrored slice of 
a UFS root and a 4 way mirrored slice of UFS swap, and then leave equal slices 
free for ZFS, but then it sounds like I would have to risk not following the 
best practice and have to mess with SVM. The nice thing is I could be looking 
at 128GB yield with a decent level fault tolerance.
With Zpooling could I take each of the two 64GB slices and place them into two 
zfs stripes (raid0) and then join those two into a zfs mirror? Seems like then 
I could get a 128GB yield without having to use the RaidZ/Hot or Raidz2 which 
according the links I just skimmed performs/lasts below mirroring

The other option I described above, I could just slap the first two disks into 
a HW raid 1 as the x4200's support 2 disk raid1's and then slap the remaining 2 
disks into ZFS and this (I think) would not be violating the best practice?

Any thoughts on the best practice points I am raising? It disturbs me that it 
would make a statement like "don't use slices for production".
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS/UFS layout for 4 disk servers

2007-03-07 Thread Frank Cusack

On March 7, 2007 8:50:53 AM -0800 Matt B <[EMAIL PROTECTED]> wrote:

Any thoughts on the best practice points I am raising? It disturbs me
that it would make a statement like "don't use slices for production".


I think that's just a performance thing.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS/UFS layout for 4 disk servers

2007-03-07 Thread Richard Elling

Frank Cusack wrote:

On March 7, 2007 8:50:53 AM -0800 Matt B <[EMAIL PROTECTED]> wrote:

Any thoughts on the best practice points I am raising? It disturbs me
that it would make a statement like "don't use slices for production".


I think that's just a performance thing.


yep, for those systems with lots of disks.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS/UFS layout for 4 disk servers

2007-03-07 Thread Richard Elling

Matt B wrote:

Thanks for responses. There is a lot there I am looking forward to digesting.
Right off the bat though I wanted to bring up something I found just before 
reading this reply as the answer to this question would automatically answer 
some other questinos


There is a ZFS best practices wiki at 

> 
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#General_Storage_Pool_Performance_Considerations


that makes a couple points:

*Swap space - Because ZFS caches data in kernel addressable memory, the kernel 
sizes will likely be larger than with other file systems. Configure additional 
disk-based swap to account for this difference. You can use the size of physical 
memory as an upper bound to the extra amount of swap space that might be required. 
Do not use slices on the same disk for both swap space and ZFS file systems. 
Keep the swap areas separate from the ZFS file systems.


This recommendation is only suitable for low memory systems with lots of disks.
Clearly, it would be impractical for a system with a single disk.


*Do not use slices for storage pools that are intended for production use.

So after reading this it seems that with only 4 disks to work with and the fact 
that UFS for root (initial install) is still required that my only option to conform 
to the best practice is to use two disks with UFS/RAid1 leaving only the remaining 
two disks for 100% ZFS. Additionally, the swap partition would have to go on the 
UFS set of disks to keep it seperate from the ZFS set of disks


If I am misinterpreting the wiki please let me know.


The best thing about best practices is that there are so many of them :-/
I'll see if I can clarify in the wiki.

There are some tradeoffs here. I would prefer to use a 4 way mirrored slice of a 
UFS root and a 4 way mirrored slice of UFS swap, and then leave equal slices free 
for ZFS, but then it sounds like I would have to risk not following the best 
practice and have to mess with SVM. The nice thing is I could be looking at 128GB 
yield with a decent level fault tolerance.


With Zpooling could I take each of the two 64GB slices and place them into two zfs 
stripes (raid0) and then join those two into a zfs mirror? Seems like then I could 
get a 128GB yield without having to use the RaidZ/Hot or Raidz2 which according 
the links I just skimmed performs/lasts below mirroring


Be careful, with ZFS you don't take a stripe and mirror it (RAID-0+1), you take 
a
mirror and stripe it (RAID-1+0).  For example, you would do:
zpool create mycoolpool mirror c_d_t_s_ c_d_t_s_ mirror c_d_t_s_ 
c_d_t_s_

The other option I described above, I could just slap the first two disks into a 
HW raid 1 as the x4200's support 2 disk raid1's and then slap the remaining 2 disks 
into ZFS and this (I think) would not be violating the best practice?


Yes, this would work fine.  It would simplify your boot and OS install/upgrade.
I'd still recommend planning on using LiveUpgrade -- leave a spare slice for an
alternate boot environment.

Any thoughts on the best practice points I am raising? It disturbs me that it would 
make a statement like "don't use slices for production".


Sometimes it is not what you say, it is how you say it.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS/UFS layout for 4 disk servers

2007-03-07 Thread Matt B
So it sounds like the consensus is that I should not worry about using slices 
with ZFS
and the swap best practice doesn't really apply to my situation of a 4 disk 
x4200.

So in summary(please confirm) this is what we are saying is a safe bet for 
using in a highly available production environment?

With 4x73 gig disks yielding 70GB each:

5GB for root which is UFS and mirrored 4 ways using SVM.
8GB for swap which is raw and mirrored across first two disks (optional: or no 
liveupgrade and 4 way mirror this swap partition)
8GB for LiveUpgrade which is mirrored across the third and fourth two disks
This leaves 57GB of free space on each of the 4 disks in slices
One zfs pool will be created containing the 4 slices
the first two slices will be used in a zmirror yielding 57GB
The last two slices will be used in a zmirror yielding 57GB
Then a zstripe (raid0) will be layed over the two zmirrors yielding 114GB 
usable space while able to sustain any 2 drives failing without a loss in data

Thanks

P.S.
Availability is determined by using a synthetic SLA monitor that operates on 2 
minute cycles evaluating against a VIP by an external third party. If there are 
no errors in the report for the month we hit 100%, I think even one error (due 
to the 2 minute window) puts us below 6 9's..so we basically have a zero 
tolerance standard to hit the sla and not get penalized monetarily
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS/UFS layout for 4 disk servers

2007-03-07 Thread Wade . Stuart





[EMAIL PROTECTED] wrote on 03/07/2007 12:31:14 PM:

> So it sounds like the consensus is that I should not worry about
> using slices with ZFS
> and the swap best practice doesn't really apply to my situation of a
> 4 disk x4200.
>
> So in summary(please confirm) this is what we are saying is a safe
> bet for using in a highly available production environment?
>
> With 4x73 gig disks yielding 70GB each:
>
> 5GB for root which is UFS and mirrored 4 ways using SVM.
> 8GB for swap which is raw and mirrored across first two disks
> (optional: or no liveupgrade and 4 way mirror this swap partition)
> 8GB for LiveUpgrade which is mirrored across the third and fourth two
disks
> This leaves 57GB of free space on each of the 4 disks in slices
> One zfs pool will be created containing the 4 slices
> the first two slices will be used in a zmirror yielding 57GB
> The last two slices will be used in a zmirror yielding 57GB
> Then a zstripe (raid0) will be layed over the two zmirrors yielding
> 114GB usable space while able to sustain any 2 drives failing
> without a loss in data

No,  you will be able to sustain up to one disk in each of the two disk
pairs failing at any time with no data loss.  Lose two disks in the mirror
pair set and you lose data (and system panic) --  slightly different then
"any two disks".

>
> Thanks
>
> P.S.
> Availability is determined by using a synthetic SLA monitor that
> operates on 2 minute cycles evaluating against a VIP by an external
> third party. If there are no errors in the report for the month we
> hit 100%, I think even one error (due to the 2 minute window) puts
> us below 6 9's..so we basically have a zero tolerance standard to
> hit the sla and not get penalized monetarily
>
>
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS/UFS layout for 4 disk servers

2007-03-07 Thread Manoj Joseph

Matt B wrote:

Any thoughts on the best practice points I am raising? It disturbs me
that it would make a statement like "don't use slices for
production".


ZFS turns on write cache on the disk if you give it the entire disk to 
manage. It is good for performance. So, you should use whole disks when 
ever possible.


Slices work too, but write cache for the disk will not be turned on by zfs.

Cheers
Manoj
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] writes lost with zfs !

2007-03-07 Thread Ayaz Anjum
HI !

I have tested the following scenario

created a zfs filesystem as part of HAStoragePlus in SunCluster 3.2, 
Solaris 11/06

Currently i am having only one fc hba per server.

1. There is no IO to the zfs mountpoint. I disconnected the FC cable. 
Filesystem on zfs still shows as mounted (because of no IO to filesystem). 
I touch a file. Still ok. i did a "sync" and only then the node panicked 
and zfs filesystem failed over to other cluster node. however my file 
which i touched is lost  

2. with zfs mounted on one cluster node, i created a file and keeps it 
updating every second, then i removed the fc cable, the writes are still 
continuing to the file system, after 10 seconds i have put back the fc 
cable and my writes continues, no failover of zfs happens.

seems that all IO are going to some cache. Any suggestions on whts going 
wrong over here and whts the solution to this.

thanks


Ayaz Anjum








--
 

Confidentiality Notice : This e-mail  and  any attachments  are 
confidential  to  the addressee and may also be privileged.  If  you are 
not  the addressee of  this e-mail, you may not copy, forward, disclose or 
otherwise use it in any way whatsoever.  If you have received this e-mail 
by mistake,  please  e-mail  the sender by replying to this message, and 
delete the original and any print out thereof. ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] writes lost with zfs !

2007-03-07 Thread Manoj Joseph

Ayaz Anjum wrote:


HI !

I have tested the following scenario

created a zfs filesystem as part of HAStoragePlus in SunCluster 3.2, 
Solaris 11/06


Currently i am having only one fc hba per server.

1. There is no IO to the zfs mountpoint. I disconnected the FC cable. 
Filesystem on zfs still shows as mounted (because of no IO to 
filesystem). I touch a file. Still ok. i did a "sync" and only then the 
node panicked and zfs filesystem failed over to other cluster node. 
however my file which i touched is lost 


This is to be expected, I'd say.

HAStoragePlus is primarily a wrapper over zfs that manages the 
import/export and mount/unmount. It can not and does not provide for a 
retry of pending IOs.


The 'touch' would have been part of a zfs transaction group that never 
got committed. And it stays lost when the pool is imported on the other 
node.


In other words, it does not provide the same kind of high availability 
that, say, PxFS for instance provides.


2. with zfs mounted on one cluster node, i created a file and keeps it 
updating every second, then i removed the fc cable, the writes are still 
continuing to the file system, after 10 seconds i have put back the fc 
cable and my writes continues, no failover of zfs happens.


seems that all IO are going to some cache. Any suggestions on whts going 
wrong over here and whts the solution to this.


I don't know for sure. But my guess is, if you do a fsync after the 
writes and wait for the fsync to complete, then you might get some 
action. fsync should fail. zfs could panic the node. If it does, you 
will see a failover.


Hope that helps.

-Manoj


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss