[zfs-discuss] Checksum errors in storage pool
Hi, I am using ZFS under Solaris 10u3. After the defect of a 3510 Raid controller, I have several storage pools with defect objects. "zpool status -xv" prints a long list: DATASET OBJECT RANGE 4c0c 5dd lvl=0 blkid=2 28 b346lvl=0 blkid=9 3b31 15d lvl=0 blkid=1 3b31 15d lvl=0 blkid=2 3b31 15d lvl=0 blkid=2727 3b31 190 lvl=0 blkid=0 ... I know that the number in the column "OBJECT" identifies the inode number of the affected file. However, I have more than 1000 filesystems in each of the affected storage pools. So how do I identify the correct filesystem? According to http://blogs.sun.com/erickustarz/entry/damaged_files_and_zpool_status I have to use zdb. But I can't figure out how to use it. Can you help? Hans Schnitzer This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS/UFS layout for 4 disk servers
Thanks for responses. There is a lot there I am looking forward to digesting. Right off the bat though I wanted to bring up something I found just before reading this reply as the answer to this question would automatically answer some other questinos There is a ZFS best practices wiki at http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#General_Storage_Pool_Performance_Considerations that makes a couple points: *Swap space - Because ZFS caches data in kernel addressable memory, the kernel sizes will likely be larger than with other file systems. Configure additional disk-based swap to account for this difference. You can use the size of physical memory as an upper bound to the extra amount of swap space that might be required. Do not use slices on the same disk for both swap space and ZFS file systems. Keep the swap areas separate from the ZFS file systems. *Do not use slices for storage pools that are intended for production use. So after reading this it seems that with only 4 disks to work with and the fact that UFS for root (initial install) is still required that my only option to conform to the best practice is to use two disks with UFS/RAid1 leaving only the remaining two disks for 100% ZFS. Additionally, the swap partition would have to go on the UFS set of disks to keep it seperate from the ZFS set of disks If I am misinterpreting the wiki please let me know. There are some tradeoffs here. I would prefer to use a 4 way mirrored slice of a UFS root and a 4 way mirrored slice of UFS swap, and then leave equal slices free for ZFS, but then it sounds like I would have to risk not following the best practice and have to mess with SVM. The nice thing is I could be looking at 128GB yield with a decent level fault tolerance. With Zpooling could I take each of the two 64GB slices and place them into two zfs stripes (raid0) and then join those two into a zfs mirror? Seems like then I could get a 128GB yield without having to use the RaidZ/Hot or Raidz2 which according the links I just skimmed performs/lasts below mirroring The other option I described above, I could just slap the first two disks into a HW raid 1 as the x4200's support 2 disk raid1's and then slap the remaining 2 disks into ZFS and this (I think) would not be violating the best practice? Any thoughts on the best practice points I am raising? It disturbs me that it would make a statement like "don't use slices for production". This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS/UFS layout for 4 disk servers
On March 7, 2007 8:50:53 AM -0800 Matt B <[EMAIL PROTECTED]> wrote: Any thoughts on the best practice points I am raising? It disturbs me that it would make a statement like "don't use slices for production". I think that's just a performance thing. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS/UFS layout for 4 disk servers
Frank Cusack wrote: On March 7, 2007 8:50:53 AM -0800 Matt B <[EMAIL PROTECTED]> wrote: Any thoughts on the best practice points I am raising? It disturbs me that it would make a statement like "don't use slices for production". I think that's just a performance thing. yep, for those systems with lots of disks. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS/UFS layout for 4 disk servers
Matt B wrote: Thanks for responses. There is a lot there I am looking forward to digesting. Right off the bat though I wanted to bring up something I found just before reading this reply as the answer to this question would automatically answer some other questinos There is a ZFS best practices wiki at > http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#General_Storage_Pool_Performance_Considerations that makes a couple points: *Swap space - Because ZFS caches data in kernel addressable memory, the kernel sizes will likely be larger than with other file systems. Configure additional disk-based swap to account for this difference. You can use the size of physical memory as an upper bound to the extra amount of swap space that might be required. Do not use slices on the same disk for both swap space and ZFS file systems. Keep the swap areas separate from the ZFS file systems. This recommendation is only suitable for low memory systems with lots of disks. Clearly, it would be impractical for a system with a single disk. *Do not use slices for storage pools that are intended for production use. So after reading this it seems that with only 4 disks to work with and the fact that UFS for root (initial install) is still required that my only option to conform to the best practice is to use two disks with UFS/RAid1 leaving only the remaining two disks for 100% ZFS. Additionally, the swap partition would have to go on the UFS set of disks to keep it seperate from the ZFS set of disks If I am misinterpreting the wiki please let me know. The best thing about best practices is that there are so many of them :-/ I'll see if I can clarify in the wiki. There are some tradeoffs here. I would prefer to use a 4 way mirrored slice of a UFS root and a 4 way mirrored slice of UFS swap, and then leave equal slices free for ZFS, but then it sounds like I would have to risk not following the best practice and have to mess with SVM. The nice thing is I could be looking at 128GB yield with a decent level fault tolerance. With Zpooling could I take each of the two 64GB slices and place them into two zfs stripes (raid0) and then join those two into a zfs mirror? Seems like then I could get a 128GB yield without having to use the RaidZ/Hot or Raidz2 which according the links I just skimmed performs/lasts below mirroring Be careful, with ZFS you don't take a stripe and mirror it (RAID-0+1), you take a mirror and stripe it (RAID-1+0). For example, you would do: zpool create mycoolpool mirror c_d_t_s_ c_d_t_s_ mirror c_d_t_s_ c_d_t_s_ The other option I described above, I could just slap the first two disks into a HW raid 1 as the x4200's support 2 disk raid1's and then slap the remaining 2 disks into ZFS and this (I think) would not be violating the best practice? Yes, this would work fine. It would simplify your boot and OS install/upgrade. I'd still recommend planning on using LiveUpgrade -- leave a spare slice for an alternate boot environment. Any thoughts on the best practice points I am raising? It disturbs me that it would make a statement like "don't use slices for production". Sometimes it is not what you say, it is how you say it. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS/UFS layout for 4 disk servers
So it sounds like the consensus is that I should not worry about using slices with ZFS and the swap best practice doesn't really apply to my situation of a 4 disk x4200. So in summary(please confirm) this is what we are saying is a safe bet for using in a highly available production environment? With 4x73 gig disks yielding 70GB each: 5GB for root which is UFS and mirrored 4 ways using SVM. 8GB for swap which is raw and mirrored across first two disks (optional: or no liveupgrade and 4 way mirror this swap partition) 8GB for LiveUpgrade which is mirrored across the third and fourth two disks This leaves 57GB of free space on each of the 4 disks in slices One zfs pool will be created containing the 4 slices the first two slices will be used in a zmirror yielding 57GB The last two slices will be used in a zmirror yielding 57GB Then a zstripe (raid0) will be layed over the two zmirrors yielding 114GB usable space while able to sustain any 2 drives failing without a loss in data Thanks P.S. Availability is determined by using a synthetic SLA monitor that operates on 2 minute cycles evaluating against a VIP by an external third party. If there are no errors in the report for the month we hit 100%, I think even one error (due to the 2 minute window) puts us below 6 9's..so we basically have a zero tolerance standard to hit the sla and not get penalized monetarily This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS/UFS layout for 4 disk servers
[EMAIL PROTECTED] wrote on 03/07/2007 12:31:14 PM: > So it sounds like the consensus is that I should not worry about > using slices with ZFS > and the swap best practice doesn't really apply to my situation of a > 4 disk x4200. > > So in summary(please confirm) this is what we are saying is a safe > bet for using in a highly available production environment? > > With 4x73 gig disks yielding 70GB each: > > 5GB for root which is UFS and mirrored 4 ways using SVM. > 8GB for swap which is raw and mirrored across first two disks > (optional: or no liveupgrade and 4 way mirror this swap partition) > 8GB for LiveUpgrade which is mirrored across the third and fourth two disks > This leaves 57GB of free space on each of the 4 disks in slices > One zfs pool will be created containing the 4 slices > the first two slices will be used in a zmirror yielding 57GB > The last two slices will be used in a zmirror yielding 57GB > Then a zstripe (raid0) will be layed over the two zmirrors yielding > 114GB usable space while able to sustain any 2 drives failing > without a loss in data No, you will be able to sustain up to one disk in each of the two disk pairs failing at any time with no data loss. Lose two disks in the mirror pair set and you lose data (and system panic) -- slightly different then "any two disks". > > Thanks > > P.S. > Availability is determined by using a synthetic SLA monitor that > operates on 2 minute cycles evaluating against a VIP by an external > third party. If there are no errors in the report for the month we > hit 100%, I think even one error (due to the 2 minute window) puts > us below 6 9's..so we basically have a zero tolerance standard to > hit the sla and not get penalized monetarily > > > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS/UFS layout for 4 disk servers
Matt B wrote: Any thoughts on the best practice points I am raising? It disturbs me that it would make a statement like "don't use slices for production". ZFS turns on write cache on the disk if you give it the entire disk to manage. It is good for performance. So, you should use whole disks when ever possible. Slices work too, but write cache for the disk will not be turned on by zfs. Cheers Manoj ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] writes lost with zfs !
HI ! I have tested the following scenario created a zfs filesystem as part of HAStoragePlus in SunCluster 3.2, Solaris 11/06 Currently i am having only one fc hba per server. 1. There is no IO to the zfs mountpoint. I disconnected the FC cable. Filesystem on zfs still shows as mounted (because of no IO to filesystem). I touch a file. Still ok. i did a "sync" and only then the node panicked and zfs filesystem failed over to other cluster node. however my file which i touched is lost 2. with zfs mounted on one cluster node, i created a file and keeps it updating every second, then i removed the fc cable, the writes are still continuing to the file system, after 10 seconds i have put back the fc cable and my writes continues, no failover of zfs happens. seems that all IO are going to some cache. Any suggestions on whts going wrong over here and whts the solution to this. thanks Ayaz Anjum -- Confidentiality Notice : This e-mail and any attachments are confidential to the addressee and may also be privileged. If you are not the addressee of this e-mail, you may not copy, forward, disclose or otherwise use it in any way whatsoever. If you have received this e-mail by mistake, please e-mail the sender by replying to this message, and delete the original and any print out thereof. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] writes lost with zfs !
Ayaz Anjum wrote: HI ! I have tested the following scenario created a zfs filesystem as part of HAStoragePlus in SunCluster 3.2, Solaris 11/06 Currently i am having only one fc hba per server. 1. There is no IO to the zfs mountpoint. I disconnected the FC cable. Filesystem on zfs still shows as mounted (because of no IO to filesystem). I touch a file. Still ok. i did a "sync" and only then the node panicked and zfs filesystem failed over to other cluster node. however my file which i touched is lost This is to be expected, I'd say. HAStoragePlus is primarily a wrapper over zfs that manages the import/export and mount/unmount. It can not and does not provide for a retry of pending IOs. The 'touch' would have been part of a zfs transaction group that never got committed. And it stays lost when the pool is imported on the other node. In other words, it does not provide the same kind of high availability that, say, PxFS for instance provides. 2. with zfs mounted on one cluster node, i created a file and keeps it updating every second, then i removed the fc cable, the writes are still continuing to the file system, after 10 seconds i have put back the fc cable and my writes continues, no failover of zfs happens. seems that all IO are going to some cache. Any suggestions on whts going wrong over here and whts the solution to this. I don't know for sure. But my guess is, if you do a fsync after the writes and wait for the fsync to complete, then you might get some action. fsync should fail. zfs could panic the node. If it does, you will see a failover. Hope that helps. -Manoj ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss