[zfs-discuss] ZFS - VMware ESX --> vSphere Upgrade : Zpool Faulted
Hi All, We recently upgraded our Solaris 10 servers from ESX 3.5 to vSphere and in the process, the zpools appeared to become FAULTED even though we did not touch the OS. We detached the Physical RDM (1TB) from the Virtual Machine and attached to another idential Virtual machine to see if that fixed the problem, but unfortunately, typing Zpool status and Zpool import finds nothing even though "FORMAT" and FORMAT -E displays the 1TB volume. Are there any known problems or ways to reimport a supposed lost/confused zpool on a new host? Thanks Andrew -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS - VMware ESX --> vSphere Upgrade : Zpool Faulted
Ok, The fault appears to have occurred regardless of the attempts to move to vSphere as we've now moved the host back to ESX 3.5 from whence it came and the problem still exists. Looks to me like the fault occurred as a result of a reboot. Any help and advice would be greatly appreciated. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS - VMware ESX --> vSphere Upgrade : Zpool Faulted
Hi Ross, Thanks for your advice. I've tried presenting as Virtual and Physical but sadly to no avail. I'm guessing if it was going to work then a quick zpool import or zpool status should at the very show me the "data" pool thats gone missing. The RDM is from a FC SAN so unfortunately I can't rely on connecting using an iSCSI initiator within the OS to attach the volume so I guess i have to dive straight into checking the MBR at this stage. I'll no doubt need some help here so please forgive me if I fall at the first hurdle. Kind Regards Andrew -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS - VMware ESX --> vSphere Upgrade : Zpool Faulted
Hi Ross, Ok - as a Solaris newbie.. i'm going to need your help. Format produces the following:- c8t4d0 (VMware-Virtualdisk-1.0 cyl 65268 alt 2 hd 255 sec 126) /p...@0,0/pci15ad,1...@10/s...@4,0 what dd command do I need to run to reference this disk? I've tried /dev/rdsk/c8t4d0 and /dev/dsk/c8t4d0 but neither of them are valid. Kind Regards Andrew -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS - VMware ESX --> vSphere Upgrade : Zpool Faulted
Hi again, Out of interest, could this problem have been avoided if the ZFS configuration didnt rely on a single disk? i.e. RAIDZ etc Thanks -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS - VMware ESX --> vSphere Upgrade : Zpool Faulted
Hi all, Great news - by attaching an identical size RDM to the server and then grabbing the first 128K using the command you specified Ross dd if=/dev/rdsk/c8t4d0p0 of=~/disk.out bs=512 count=256 we then proceeded to inject this into the faulted RDM and lo and behold the volume recovered! dd if=~/disk.out of=/dev/rdsk/c8t5d0p0 bs=512 count=256 Thanks for your help! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Native ZFS for Linux
> On 6/10/2010 9:04 PM, Rodrigo E. De León Plicet > wrote: > > On Tue, Jun 8, 2010 at 7:14 PM, Anurag > Agarwal wrote: > > > >> We at KQInfotech, initially started on an > independent port of ZFS to linux. > >> When we posted our progress about port last year, > then we came to know about > >> the work on LLNL port. Since then we started > working on to re-base our > >> changing on top Brian's changes. > >> > >> We are working on porting ZPL on that code. Our > current status is that > >> mount/unmount is working. Most of the directory > operations and read/write is > >> also working. There is still lot more development > work and testing that > >> needs to be going in this. But we are committed to > make this happen so > >> please stay tuned. > >> > > > > Good times ahead! > > > I don't mean to be a PITA, but I'm assuming that > someone lawyerly has had the appropriate discussions > with the porting team about how linking against the > GPL'd Linux kernel means your kernel module has to be > GPL-compatible. It doesn't matter if you distribute > it outside the general kernel source tarball, what > matters is that you're linking against a GPL program, > and the old GPL v2 doesn't allow for a > non-GPL-compatibly-licensed module to do that. This is incorrect. The viral effects of the GPL only take effect at the point of distribution. If ZFS is distributed seperately to the Linux kernel as a module then the person doing the combining is the user. Different if a Linux distro wanted to include it on a live CD, for example. GPL is not concerned with what code is linked with what. Cheers Andrew. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Help! System panic when pool imported
I'm getting the same thing now. I tried moving my 5-disk raidZ and 2disk Mirror over to another machine, but that machine would keep panic'ing (not ZFS related panics). When I brought the array back over, I started getting this as well.. My Mirror array is unaffected. snv111b (2009.06 release) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Help! System panic when pool imported
This is what my /var/adm/messages looks like: Sep 27 12:46:29 solaria genunix: [ID 403854 kern.notice] assertion failed: ss == NULL, file: ../../common/fs/zfs/space_map.c, line: 109 Sep 27 12:46:29 solaria unix: [ID 10 kern.notice] Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a97a0 genunix:assfail+7e () Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a9830 zfs:space_map_add+292 () Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a98e0 zfs:space_map_load+3a7 () Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a9920 zfs:metaslab_activate+64 () Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a99e0 zfs:metaslab_group_alloc+2b7 () Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a9ac0 zfs:metaslab_alloc_dva+295 () Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a9b60 zfs:metaslab_alloc+9b () Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a9b90 zfs:zio_dva_allocate+3e () Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a9bc0 zfs:zio_execute+a0 () Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a9c40 genunix:taskq_thread+193 () Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a9c50 unix:thread_start+8 () -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Expanding a root pool
I'm attempting to expand a root pool for a VMware VM that is on an 8GB virtual disk. I mirrored it to a 20GB disk and detached the 8GB disk. I did "installgrub" to install grub onto the second virtual disk, but I get a kernel panic when booting. Is there an extra step I need to perform to get this working? Basically I did this: 1. Created a new 20GB virtual disk. 2. Booted into the VM. 3. Created a Solaris partition covering the whole virtual disk. 4. Created a slice 0 covering cylinders 1 to 2607 (i.e. the whole disk but cylinder 0). 4. Attached the slice 0 to the root pool using "zpool attach rpool /dev/dsk/c3d0s0 /dev/dsk/c5t0d0s0". 5. Installed grub using "installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c5t0d0s0". 6. detached the old 8GB virtual disk using "zpool detach rpool /dev/dsk/c3d0s0". 7. init 6. 8. reconfigured the VM BIOS to boot from the 2nd HDD first. When attempting to boot the VM I now get a big warning that the root pool cannot be mounted "This device is not bootable! It is either offlined or detached or faulted. Please try to boot from another device." and a nice kernel panic, followed by the inevitable reboot. How can I get this working? I'm using OpenSolaris 2008.05 upgraded to build 93. Thanks Andrew. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Expanding a root pool
> I'm attempting to expand a root pool for a VMware VM > that is on an 8GB virtual disk. I mirrored it to a > 20GB disk and detached the 8GB disk. I did > "installgrub" to install grub onto the second virtual > disk, but I get a kernel panic when booting. Is there > an extra step I need to perform to get this working? > > Basically I did this: > > 1. Created a new 20GB virtual disk. > 2. Booted into the VM. > 3. Created a Solaris partition covering the whole > virtual disk. > 4. Created a slice 0 covering cylinders 1 to 2607 > (i.e. the whole disk but cylinder 0). > 4. Attached the slice 0 to the root pool using "zpool > attach rpool /dev/dsk/c3d0s0 /dev/dsk/c5t0d0s0". > 5. Installed grub using "installgrub > /boot/grub/stage1 /boot/grub/stage2 > /dev/rdsk/c5t0d0s0". > 6. detached the old 8GB virtual disk using "zpool > detach rpool /dev/dsk/c3d0s0". > 7. init 6. > 8. reconfigured the VM BIOS to boot from the 2nd HDD > first. > > When attempting to boot the VM I now get a big > warning that the root pool cannot be mounted "This > device is not bootable! It is either offlined or > detached or faulted. Please try to boot from another > device." and a nice kernel panic, followed by the > inevitable reboot. > > How can I get this working? I'm using OpenSolaris > 2008.05 upgraded to build 93. > > Thanks > > Andrew. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS boot mirror
I ran into this as well. For some reason installgrub needs slice 2 to be the special "backup" slice that covers the whole disk, as in Solaris. You actually specify s0 on the command line since this is the location of the ZFS root, but installgrub will go away and try to access the whole disk using slice 2 for some reason. What I did to solve it was to use format to select the disk, then the "partition" option to create a slice 2 that started on cylinder 0 and ended on the final cylinder of the disk. Once I did that installgrub worked OK. You might also need to issue the command "disks" to get Solaris to update the disk links under /dev before you use installgrub. Andrew. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS boot mirror
The second disk doesn't have the root pool on slice 2 - it is on slice 0 as with the first disk. All I did differently was to create a slice 2 covering the whole Solaris FDISK primary partition. If you then issue this command as before: installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/dsk/c5t1d0s0 (Note: slice ZERO) Then it will install grub onto that disk. You would need to ask someone else why it needs a slice 2 - I suspect that stage1 actually gets written to the first sector of the Solaris primary FDISK partition, hence it needs access to the "special" slice 2 to do that. Cheers Andrew. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS boot mirror
OK, I've put up some screenshots and a copy of my menu.lst to clarify my setup: http://sites.google.com/site/solarium/zfs-screenshots Cheers Andrew. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS boot mirror
Sounds like you've got an EFI label on the second disk. Can you run "format", select the second disk, then enter "fdisk" then "print" and post the output here? Thanks Andrew. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool upgrade wrecked GRUB
> so finally, I gathered up some courage and > "installgrub /boot/grub/stage1 /boot/grub/stage2 > /dev/rdsk/c2d0s0" seemed to write out what I assume > is a new MBR. Not the MBR - the stage1 and 2 files are written to the boot area of the Solaris FDISK partition. > tried to also installgrub on the other > disk in the mirror c3d0 and failed over several > permuations"cannot open/stat /dev/rdsk/c3d0s2" was > the error msg. This is because installgrub needs the "overlap" slice to be present as slice 2 for some reason. The overlap slice, also called the "backup" slice, covers the whole of the Solaris FDISK partition. If you don't have one on your second disk, just create one. > > however a reboot from dsk/c2dos0 gave me a healthy > and unchanged grub stage2 menu and functioning system > again . whew > > Although I cannot prove causality here, I still think > that the zpool upgrade ver.10 -> ver.11 borked the > MBR. indeed, probably the stage2 sectors, i guess. No - upgrading a ZFS pool doesn't touch the MBR or the stage2. The problem is that the grub ZFS filesystem reader needs updated to understand the version 11 pool. This doesn't (yet) happen automatically. > > I also seem to also only have single MBR between the > two disks in the mirror. is this normal? Not really normal, but at present manually creating a ZFS boot mirror in this way does not set the 2nd disk up correctly, as you've discovered. To write a new Solaris grub MBR to the second disk, do this: installgrub -m /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c3d0s0 The -m flag tells installgrub to put the grub stage1 into the MBR. Cheers Andrew. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Shared ZFS in Multi-boot?
There are two versions at play here: the pool version and the filesystem version. See here for information about ZFS *filesystem* versions: http://www.opensolaris.org/os/community/arc/caselog/2007/328/onepager/ (CIFS support integrated in build 77 so that is when the filesystem version was bumped to 3). What that document doesn't explain is what happens if you try to access a version 2 or 3 filesystem using the older code from before build 69 - i.e. from before filesystem versioning was added. I have a feeling that before build 69 there was no check done on filesystem version. I've no idea how this is handled on Linux. Cheers Andrew. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Moving a ZFS root to another target
I've got an OpenSolaris system rooted on a SCSI disk at /dev/dsk/c4t1d0s0. I would like to reconfigure my VM so that this is on c4t0d0s0. Unfortunately OpenSolaris panics on boot when I do this. It seems that vfs_mountroot is trying to mount the root pool at its old device path (/[EMAIL PROTECTED],0/pci1000,[EMAIL PROTECTED]/[EMAIL PROTECTED],0:a) which corresponds to /dev/dsk/c4t1d0s0. Where is this location hardcoded, and how do I change it? Also, is there any way to set up OpenSolaris so that this location is not hardcoded? I took a screenshot of the panic: http://sites.google.com/site/solarium/_/rsrc/1218841252931/zfs-screenshots/paniconboot.gif Thanks Andrew. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Moving a ZFS root to another target
Hmm... Just tried the same thing on SXCE build 95 and it works fine. Strange. Anyone know what's up with OpenSolaris (the distro)? I'm using the ISO of OpenSolaris 208.11 snv_93 image-updated to build 95 if that makes a difference. I've not tried this on 2008.05 . Thanks Andrew. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] OpenSolaris installer can't be run, if target ZFS pool exists.
Perhaps user properties on pools would be useful here? At present only ZFS filesystems can have user properties - not pools. Not really an immediate solution to your problem though. Cheers Andrew. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Moving a ZFS root to another target
Just tried with a fresh install from the OpenSolaris 2008.11 snv_95 CD and it works fine. Thanks Andrew. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Dumb Error - Please Help
Hi there! I made a dumb but serious error: I have a 500G external usb disk with a zfs-pool containing only this one disk. I then installed a new OS on my host-computer (debian with zfs-fuse) and wanted to access my usb-drive again. I know now that ZFS saves the pool-data on the filesystem - but then i thought it is saved on the host. So I tried to create a new pool (zpool create datahaven /dev/sda1) ... you know the rest - the existing pool is now overriden with a new, empty pool. Now my question: is there any way to get my (really important and of course not backed up ^^) files back? hope you can help a insightful dork, Andrew -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] c1t0d0 to c3t1d0
Inserting the drive does not automatically mount the ZFS filesystem on it. You need to use the "zpool import" command which lists any pools available to import, then zpool import -f {name of pool} to force the import (to force the import if you haven't exported the pool first). Cheers Andrew. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import of bootable root pool renders it unbootable
I came across this bug in a similar way myself. The explanation given by Stephen Hahn is this: -- For a while, the boot-archive on 2008.nn systems included a copy of zpool.cache. Recent versions do not make this mistake. Delete and regenerate your boot archive, and you should be able to make the transfer. See http://mail.opensolaris.org/pipermail/indiana-discuss/2008-August/008341.html and following. --- If you install a system from scratch with the latest test build of OpenSolaris 2008.11 (which you can get from genunix.org) then you won't have this problem. Solaris Express Community Edition (SXCE) is also not affected by this bug. Cheers Andrew. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import of bootable root pool renders it unbootable
> I've upgraded to b98, checked if zpool.cache is not > being added to > boot archive and tried to boot from VB by presenting > a prtition to it. > It didn't. I got it working by installing a new build of OpenSolaris 2008.11 from scratch rather than upgrading, but deleting zpool.cache, deleting both boot archives, then doing a "bootadm update-archive" should work. Cheers Andrew. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS, Kernel Panic on import
I woke up yesterday morning, only to discover my system kept rebooting.. It's been running fine for the last while. I upgraded to snv 98 a couple weeks back (from 95), and had upgraded my RaidZ Zpool from version 11 to 13 for improved scrub performance. After some research it turned out that, on bootup, importing my 4tb raidZ array was causing the system to panic (similar to this OP's error). I got that bypassed, and can now at least boot the system.. However, when I try anything (like mdb -kw), it advises me that there is no command line editing because: "mdb: no terminal data available for TERM=vt320. term init failed: command-line editing and prompt will not be available". This means I can't really try what aldredmr had done in mdb, and I really don't have any experience in it. I upgraded to snv_100 (November), but experiencing the exact same issues. If anyone has some insight, it would be greatly appreciated. Thanks -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS, Kernel Panic on import
Thanks a lot! Google didn't seem to cooperate as well as I had hoped. Still no dice on the import. I only have shell access on my Blackberry Pearl from where I am, so it's kind of hard, but I'm managing.. I've tried the OP's exact commands, and even trying to import array as ro, yet the system still wants to panic.. I really hope I don't have to redo my array, and lose everything as I still have faith in ZFS... -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic at zpool import
Do you guys have any more information about this? I've tried the offset methods, zfs_recover, aok=1, mounting read only, yada yada, with still 0 luck. I have about 3TBs of data on my array, and I would REALLY hate to lose it. Thanks! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS, Kernel Panic on import
hey Victor, Where would i find that? I'm still somewhat getting used to the Solaris environment. /var/adm/messages doesn't seem to show any Panic info.. I only have remote access via SSH, so I hope I can do something with dtrace to pull it. Thanks, Andrew -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS, Kernel Panic on import
Not too sure if it's much help. I enabled kernel pages and curproc.. Let me know if I need to enable "all" then. solaria crash # echo "::status" | mdb -k debugging live kernel (64-bit) on solaria operating system: 5.11 snv_98 (i86pc) solaria crash # echo "::stack" | mdb -k solaria crash # echo "::msgbuf -v" | mdb -k TIMESTAMP LOGCTL MESSAGE 2008 Nov 7 18:53:55 ff01c901dcf0 capacity = 1953525168 sectors 2008 Nov 7 18:53:55 ff01c901db70 /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci1095,[EMAIL PROTECTED] : 2008 Nov 7 18:53:55 ff01c901d9f0 SATA disk device at port 0 2008 Nov 7 18:53:55 ff01c901d870 model ST31000340AS 2008 Nov 7 18:53:55 ff01c901d6f0 firmware SD15 2008 Nov 7 18:53:55 ff01c901d570 serial number 2008 Nov 7 18:53:55 ff01c901d3f0 supported features: 2008 Nov 7 18:53:55 ff01c901d270 48-bit LBA, DMA, Native Command Queueing, SMART self-test 2008 Nov 7 18:53:55 ff01c901d0f0 SATA Gen1 signaling speed (1.5Gbps) 2008 Nov 7 18:53:55 ff01c901adf0 Supported queue depth 32, limited to 31 2008 Nov 7 18:53:55 ff01c901ac70 capacity = 1953525168 sectors 2008 Nov 7 18:53:55 ff01c901aaf0 /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci1095,[EMAIL PROTECTED] : 2008 Nov 7 18:53:55 ff01c901a970 SATA disk device at port 0 2008 Nov 7 18:53:55 ff01c901a7f0 model Maxtor 6L250S0 2008 Nov 7 18:53:55 ff01c901a670 firmware BANC1G10 2008 Nov 7 18:53:55 ff01c901a4f0 serial number 2008 Nov 7 18:53:55 ff01c901a370 supported features: 2008 Nov 7 18:53:55 ff01c901a2b0 48-bit LBA, DMA, Native Command Queueing, SMART self-test 2008 Nov 7 18:53:55 ff01c901a130 SATA Gen1 signaling speed (1.5Gbps) 2008 Nov 7 18:53:55 ff01c901a070 Supported queue depth 32, limited to 31 2008 Nov 7 18:53:55 ff01c9017ef0 capacity = 490234752 sectors 2008 Nov 7 18:53:55 ff01c9017d70 pseudo-device: ramdisk1024 2008 Nov 7 18:53:55 ff01c9017bf0 ramdisk1024 is /pseudo/[EMAIL PROTECTED] 2008 Nov 7 18:53:55 ff01c9017a70 NOTICE: e1000g0 registered 2008 Nov 7 18:53:55 ff01c90179b0 pcplusmp: pci8086,100e (e1000g) instance 0 vector 0x14 ioapic 0x2 intin 0x14 is bound to cpu 0 2008 Nov 7 18:53:55 ff01c90178f0 Intel(R) PRO/1000 Network Connection, Driver Ver. 5.2.12 2008 Nov 7 18:53:56 ff01c9017830 pseudo-device: lockstat0 2008 Nov 7 18:53:56 ff01c9017770 lockstat0 is /pseudo/[EMAIL PROTECTED] 2008 Nov 7 18:53:56 ff01c90176b0 sd6 at si31240: target 0 lun 0 2008 Nov 7 18:53:56 ff01c90175f0 sd6 is /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci1095,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 2008 Nov 7 18:53:56 ff01c9017530 sd5 at si31242: target 0 lun 0 2008 Nov 7 18:53:56 ff01c9017470 sd5 is /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci1095,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 2008 Nov 7 18:53:56 ff01c90173b0 sd4 at si31241: target 0 lun 0 2008 Nov 7 18:53:56 ff01c90172f0 sd4 is /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci1095,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 2008 Nov 7 18:53:56 ff01c9017230 /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci1095,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd4) online 2008 Nov 7 18:53:56 ff01c9017170 /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci1095,[EMAIL PROTECTED] : 2008 Nov 7 18:53:56 ff01c90170b0 SATA disk device at port 1 2008 Nov 7 18:53:56 ff01c9087f30 model ST31000340AS 2008 Nov 7 18:53:56 ff01c9087e70 firmware SD15 2008 Nov 7 18:53:56 ff01c9087db0 serial number 2008 Nov 7 18:53:56 ff01c9087cf0 supported features: 2008 Nov 7 18:53:56 ff01c9087c30 48-bit LBA, DMA, Native Command Queueing, SMART self-test 2008 Nov 7 18:53:56 ff01c9087b70 SATA Gen1 signaling speed (1.5Gbps) 2008 Nov 7 18:53:56 ff01c9087ab0 Supported queue depth 32, limited to 31 2008 Nov 7 18:53:56 ff01c90879f0 capacity = 1953525168 sectors 2008 Nov 7 18:53:56 ff01c9087930 /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci1095,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd6) online 2008 Nov 7 18:53:56 ff01c9087870 /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci1095,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd5) online 2008 Nov 7 18:53:56 ff01c90877b0 /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci1095,[EMAIL PROTECTED] : 2008 Nov 7 18:53:56 ff01c90876f0 SATA disk device at port 1 2008 Nov 7 18:53:56 ff01c9087630 model ST31000340AS 2008 Nov 7 18:53:56 ff01c9087570 firmware SD15 2008 Nov 7 18:53:56 ff01c90874b0 serial number 2008 Nov 7 18:53:56 ff01c90873f0 supported features: 2008 Nov 7 18:53:56 ff01c9087330 48-bit LBA, DMA, Native Command Queueing, SMART self-test 2008 Nov 7 18:53:56 ff01c9087270 SATA Gen1 signaling speed (1.5Gbps) 2008 Nov 7 18:53:56 ff01c90871b0 Supported queu
Re: [zfs-discuss] ZFS, Kernel Panic on import
So I tried a few more things.. I think the combination of the following in /etc/system made a difference: set pcplusmp:apic_use_acpi=0 set sata:sata_max_queue_depth = 0x1 set zfs:zfs_recover=1 <<< I had this before set aok=1 <<< I had this before too I crossed my fingers, and it actually imported this time.. Somehow .. solaria ~ # zpool status pool: itank state: ONLINE scrub: scrub in progress for 0h7m, 2.76% done, 4h33m to go config: NAME STATE READ WRITE CKSUM itankONLINE 0 0 0 raidz1 ONLINE 0 0 0 c12t1d0 ONLINE 0 0 0 c13t0d0 ONLINE 0 0 0 c11t0d0 ONLINE 0 0 0 c13t1d0 ONLINE 0 0 0 c11t1d0 ONLINE 0 0 0 Running some scrubs on it now, and I HOPE everything is okay... Anything else you suggest I try before it's considered stable? Thanks -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression
> I am [trying to] perform a test prior to moving my > data to solaris and zfs. Things are going very > poorly. Please suggest what I might do to understand > what is going on, report a meaningful bug report, fix > it, whatever! > > Both to learn what the compression could be, and to > induce a heavy load to expose issues, I am running > with compress=gzip-9. > > I have two machines, both identical 800MHz P3 with > 768MB memory. The disk complement and OS is > different. My current host is Suse Linux 10.2 > (2.6.18 kernel) running two 120GB drives under LVM. > My test machine is 2008.11 B2 with two 200GB drives > on the motherboard secondary IDE, zfs mirroring > them, NFS exported. > > My "test" is to simply run "cp -rp * /testhome" on > the Linux machine, where /testhome is the NFS mounted > zfs file system on the Solaris system. > > It starts out with "reasonable" throughput. Although > the heavy load makes the Solaris system pretty jerky > and unresponsive, it does work. The Linux system is > a little jerky and unresponsive, I assume due to > waiting for sluggish network responses. > > After about 12 hours, the throughput has slowed to a > crawl. The Solaris machine takes a minute or more to > respond to every character typed and mouse click. > The Linux machines is no longer jerky, which makes > sense since it has to wait alot for Solaris. Stuff > is flowing, but throughput is in the range of 100K > bytes/second. > > The Linux machine (available for tests) "gzip -9"ing > a few multi-GB files seems to get 3MB/sec +/- 5% > pretty consistently. Being the exact same CPU, RAM > (Including brand and model), Chipset, etc. I would > expect should have similar throughput from ZFS. This > is in the right ballpark of what I saw when the copy > first started. In an hour or two it moved about > 17GB. > > I am also running a "vmstat" and a "top" to a log > file. Top reports total swap size as 512MB, 510 > available. vmstat for the first few hours reported > something reasonable (it never seems to agree with > top), but now is reporting around 570~580MB, and for > a while was reporting well over 600MB free swap out > of the 512M total! > > I have gotten past a top memory leak (opensolaris.com > bug 5482) and so am now running top only one > iteration, in a shell for loop with a sleep instead > of letting it repeat. This was to be my test run to > see it work. > > What information can I capture and how can I capture > it to figure this out? > > My goal is to gain confidence in this system. The > idea is that Solaris and ZFS should be more reliable > than Linux and LVM. Although I have never lost data > due to Linux problems, I have lost it due to disk > failure, and zfs should cover that! > > Thank you ahead for any ideas or suggestions. Solaris reports "virtual memory" as the sum of physical memory and page file - so this is where your strange vmstat output comes from. Running ZFS stress tests on a system with only 768MB of memory is not a good idea since ZFS uses large amounts of memory for its cache. You can limit the size of the ARC (Adaptive Replacement Cache) using the details here: http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Limiting_the_ARC_Cache Try limiting the ARC size then run the test again - if this works then memory contention is the cause of the slowdown. Also, NFS to ZFS filesystems will run slowly under certain conditions -including with the default configuration. See this link for more information: http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Cache_Flushes Cheers Andrew. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Bitrot and panics
IIRC, uncorrectable bitrot even in a nonessential file detected by ZFS used to cause a kernel panic. Bug ID 4924238 was closed with the claim that bitrot-induced panics is not a bug, but the description did mention an open bug ID 4879357, which suggests that it's considered a bug after all. Can somebody clarify the intended behavior? For example, if I'm running Solaris in a VM, then I shut down the VM, flip a bit in the file which hosts the disk for the VM such that a nonessential file on that disk is corrupted, and then power up the VM and try to read that file so that ZFS detects bitrot and there's no mirror available to correct the bitrot, then what is supposed to happen? This message posted from opensolaris.org ___ zfs-discuss mailing list [EMAIL PROTECTED] http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Bitrot and panics
eschrock wrote: > Unfortunately, there is one exception to this rule. ZFS currently does > not handle write failure in an unreplicated pool. As part of writing > out data, it is sometimes necessary to read in space map data. If this > fails, then we can panic due to write failure. This is a known bug and > is being worked on. Do you know if there's a bug ID for this? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs write cache enable on boot disks ?
What is the reasoning behind ZFS not enabling the write cache for the root pool? Is there a way of forcing ZFS to enable the write cache? Thanks Andrew. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Q: ZFS Boot Install / LU support ETA?
What is the current estimated ETA on the integration of install support for ZFS boot/root support to Nevada? Also, do you have an idea when we can expect the improved ZFS write throttling to integrate? Thanks Andrew. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Q: ZFS Boot Install / LU support ETA?
By my calculations that makes the possible release date for ZFS boot installer support around the 9th June 2008. Mark that date in your diary! Cheers Andrew. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] openSolaris ZFS root, swap, dump
To do what you want requires at least Nevada build 88, or probably build 90 since the Nevada installer, unlike the one in OpenSolaris 2008.05, cannot currently install into a ZFS root pool. Support was added to the text-mode installer and JumpStart in build 90 for installing Solaris to a ZFS root pool. This is about 3 weeks from being released as a DVD and CD image. Also, you might be pleased to learn that in build 87 Solaris moved the root user's home directory from the root of the filesystem (i.e. the / directory) to its own directory, namely /root . Cheers Andrew. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Notification of Important Improvements to ZFS
Would it be possible for future significant ZFS-related improvements to Nevada be flagged up on the "heads up" page at http://www.opensolaris.org/os/community/on/flag-days/all/ and also on the announce forum/list, along with a note of which build of Nevada they were integrated to? I'm thinking of the ZFS write-throttling announcement which was made on a blog with no mention of which build it appeared in, and also of the install and jumpstart support for ZFS root pools which was integrated without a public mention that I saw. I think it would be useful to announce these two improvements on the announce list/forum now as well, since they are probably of interest to many users. Keep up the great work ZFS team! Cheers Andrew. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Notification of Important Improvements to ZFS
To me, it seems obvious that there are likely to be many people waiting for ZFS root support in the installer. I've added the following text to a page on genunix about the current state of play with ZFS root. Please feel free to use all, any or indeed none of it on http://opensolaris.org/os/community/zfs/boot/ : Support for installing Solaris into a ZFS root was integrated into build 90 of Nevada. To do this, you need to use either the text mode installer, or JumpStart. Since both the Solaris installer and JumpStart are closed source, you can only get your hands on them when the binaries are released by Sun. For most people this means the fortnightly DVD and CD images. Based on past release dates, build 90 of Nevada should be available on DVD and CD at around 9th June 2008. OpenSolaris version 2008.05, which is a different operating system than Solaris Nevada, based on the same common code base ''does'' have support for installing to a root ZFS filesystem. However, because it is based on build 86 of Nevada it does not contain key bug fixes which prevent the system from locking up under certain circumstances. If you need a solid, reliable system with a ZFS root, it would be best to wait for the Nevada build 90 DVD/CD images to be released. Cheers Andrew. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Notification of Important Improvements to ZFS
Apologies for the misinformation. OpenSolaris 2008.05 does *not* put swap on ZFS, so is *not* susceptible to the bugs that cause lock-ups under certain situations where the swap is on ZFS. Cheers Andrew. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] openSolaris ZFS root, swap, dump
Yes - EFI booting clearly does require support from the BIOS, since in this case the traditional PC BIOS is replaced by an EFI BIOS. Only Intel Macs use EFI rather than a traditional PC BIOS. (OK, so there are probably a few others out there, but not in any great numbers). You should still be able to boot ZFS on an Intel Mac using Boot Camp. Cheers Andrew. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Notification of Important Improvements to ZFS
Your question has already been answered on another thread: 5008936 ZFS and/or zvol should support dumps 5070124 dumpadm -d /dev/... does not enforce block device requirement for savecore 6633197 zvol should not permit newfs or createpool while it's in use by swap or dump You can look these bugs up at bugs.opensolaris.org for more info. Cheers Andrew. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Version Correct
Your Solaris 10 system should also have the Sun Update Manager which will allow you to install patches in a more automated fashion. Look for it on the Gnome / CDE menus. Cheers Andrew. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Get your SXCE on ZFS here!
With the release of the Nevada build 90 binaries, it is now possible to install SXCE directly onto a ZFS root filesystem, and also put ZFS swap onto a ZFS filesystem without worrying about having it deadlock. ZFS now also supports crash dumps! To install SXCE to a ZFS root, simply use the text-based installer, after choosing "Solaris Express" from the boot menu on the DVD. DVD download link: http://www.opensolaris.org/os/downloads/sol_ex_dvd_1/ This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS Root Install with Nevada build 90
I've got Nevada build 90 on 6 CDs and I'm trying to install it to a ZFS root - functionality that was added to the text-mode installer in build 90. Unfortunately I'm not offered the choice of using the text-mode installer! How can I install build 90 on SPARC to a ZFS root? I've done this successfully on x86/x64. Thanks Andrew. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] install opensolaris on raidz
He means that you can have two types of pool as your root pool: 1. A single physical disk. 2. A ZFS mirror. Usually this means 2 disks. RAIDZ arrays are not supported as root pools (at the moment). Cheers Andrew. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Feature proposal: differential pools
Since ZFS is COW, can I have a read-only pool (on a central file server, or on a DVD, etc) with a separate block-differential pool on my local hard disk to store writes? This way, the pool in use can be read-write, even if the main pool itself is read-only, without having to make a full local copy of that read-only pool in order to be able to write to it, and without having to use messy filesystem-level union filesystem features. This would also be useful for live-system bootable DVDs, for which the writeable block-differential pool could be stored just in system memory in order to allow a fully functional non-persistent read-write pool without having to use the system's hard disk, or stored on a small flash thumbdrive which the user carries along with the DVD to allow a persistent read-write pool without having to use the system's hard disk. For yet another feature, this ability to copy newly written blocks to a separate differential pool could be used even if those new blocks are still written back to the main pool as usual; in this case, the differential pool would serve as a real-time differential backup. For example, I could make a full backup of my laptop's hard disk onto DVDs, and then while in use have a thumbdrive plugged into the laptop. All updates to the hard disk would be copied to the thumbdrive, and when the thumbdrive fills up, it can be copied to a DVD and then erased. If the laptop's hard disk dies, I can reconstruct the system's disk state right up to the moment that it died by restoring all the DVDs to a new hard disk and then restoring the current contents of the thumbdrive. This would effectively provide the redundancy benefits of a full mirror of the laptop's hard disk, but without having to lug along an entire full-size second hard disk, since I only have to carry a thumbdrive big enough to hold the amount of differential data I expect to generate. Finally, using the upcoming hard disk/flash disk combo drives in laptops, using the flash disk as the differential pool for the main hard disk pool (instead of writing the differential data immediately back to the main pool) would allow persistent writes without having to spin up the sleeping hard disk, and the differential pool could be flushed to the main pool sometime later when the hard disk is forced to spin up anyway to service a read. (This feature is independent of the use of an external thumbdrive to mirror differential data, and both features could be used at the same time.) All of these features would be enabled by allowing pool writes to be redirected to another destination (the differential pool) separate from the pool itself, and keeping track of the txg number at which the redirection began so that pool read requests will be sent to the right place. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Feature proposal: trashcan via auto-snapshot with every txg commit
Do an automatic pool snapshot (using the recursive atomic snapshot feature that Matt Ahrens implemented recently, taking time proportional to the number of filesystems in the pool) upon every txg commit. Management of the trashcan snapshots could be done by some user-configurable policy such as preserving only a certain number of trashcan shapshots, or only the ones younger than a specified age, or destroying old ones at a sufficient rate to maintain the trashcan snapshots' total disk space usage within some specified quota (or to maintain pool free space above some specified minimum), etc. But this would provide an effective cure for the all-to-common mistakes of running "rm *" in the wrong directory or overwriting the wrong file and realizing the mistake just a moment after you've pressed "enter", among other examples. Even if this pool-wide feature would be undesirable on a particular pool due to performance concerns, it could still be applied on a filesystem basis. For example, /home might be a good candidate. A desire has been mentioned elsewhere in this forum for a snapshot-on-write feature, to which a response was made that auto-snapshotting for every byte written to every file would be really slow, to which a response was made that auto-snapshotting upon file closure might be an adequate substitute. But the latter isn't an adequate substitute in some important cases. Providing auto-snapshot on every txg commit would be an efficient compromise. Also, combining the trashcan snapshot feature (with the management policy set to never delete old snapshots) with the differential pool feature I mentioned today in another message (with the differential pool located on physically secure media) would provide an excellent auditing tool. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Flushing synchronous writes to mirrors
For a synchronous write to a pool with mirrored disks, does the write unblock after just one of the disks' write caches is flushed, or only after all of the disks' caches are flushed? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs discussion forum bug
I started three new threads recently, "Feature proposal: differential pools" "Feature proposal: trashcan via auto-snapshot with every txg commit" "Flushing synchronous writes to mirrors" Matthew Ahrens and Henk Langeveld both replied to my first thread by sending their messages to both me and to zfs-discuss@opensolaris.org, and Matt did likewise for my second thread. All of those reply messages had "references" headers. All of messages made it to me personally, and to the zfs-discuss mailing list. But none of the replies ever made it to the forum at http://www.opensolaris.org/jive/forum.jspa?forumID=80&start=0 Jeff Bonwick replied to my third thread by sending to me and to zfs-discuss@opensolaris.org, but the message did not have a "references" header. The message made it to me, to the mailing list, and to the forum as well. I post via the forum, not via the mailing list. The only pattern I can see is that maybe the forum software doesn't like replies from the mailing list to threads originated via the forum. I recall having seen this message omission bug before. To whom should I report it? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Flushing synchronous writes to mirrors
Jeff Bonwick wrote: >> For a synchronous write to a pool with mirrored disks, does the write >> unblock after just one of the disks' write caches is flushed, >> or only after all of the disks' caches are flushed? > The latter. We don't consider a write to be committed until > the data is on stable storage at full replication. [snip] That makes sense, but there's a point at which ZFS must abandon this strategy; otherwise, the malfunction of one disk in a 3-way mirror could halt the entire system, when what's probably desired is for the system to keep running in degraded mode with only 2 remaining functional disks in the mirror. But then of course there would be the problem of divergent disks in a mirror; suppose there's a system with one pool on a pair of mirrored disks, and system root is on that pool. The disks are external, with interface cables running across the room. The system is running fine until my dog trips over the cable for disk #2. Down goes disk #2, and the system continues running fine, with a degraded pool, and during operation continues modifying various files. Later, the dog chews through the cable for disk #1. Down goes the system. I don't have a spare cable, so I just plug in disk #2, and restart the system. The system continues running fine, with a degraded pool, and during operation continues modifying various files. I go to the store to buy a new cable for disk #1, and when I come back, I trip over the cable for disk #2. Down goes the system. I plug #2 back in, replace the cable for #1, and restart the system. At this point, the system comes up with its root on a pool with divergent mirrors, and... ? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Snapshot recycle freezes system activity
Gary Mills wrote: We have an IMAP e-mail server running on a Solaris 10 10/09 system. It uses six ZFS filesystems built on a single zpool with 14 daily snapshots. Every day at 11:56, a cron command destroys the oldest snapshots and creates new ones, both recursively. For about four minutes thereafter, the load average drops and I/O to the disk devices drops to almost zero. Then, the load average shoots up to about ten times normal and then declines to normal over about four minutes, as disk activity resumes. The statistics return to their normal state about ten minutes after the cron command runs. Is it destroying old snapshots or creating new ones that causes this dead time? What does each of these procedures do that could affect the system? What can I do to make this less visible to users? Creating a snapshot shouldn't do anything much more than a regular transaction group commit, which should be happening at least every 30 seconds anyway. Deleting a snapshot potentially results in freeing up the space occupied by files/blocks which aren't in any other snapshots. One way to think of this is that when you're using regular snapshots, the freeing up of space which happens when you delete files is in effect all deferred until you destroy the snapshot(s) which also refer to that space, which has the effect of bunching all your space freeing. If this is the cause (a big _if_, as I'm just speculating), then it might be a good idea to: a) spread out the deleting of the snapshots, and b) create more snapshots more often (and conversely delete more snapshots, more often), so each one contains fewer accumulated space to be freed off. -- Andrew ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sharenfs option rw,root=host1 don't take effect
On Tue, 2010-03-09 at 20:47 -0800, mingli wrote: > And I update the sharenfs option with "rw,ro...@100.198.100.0/24", it works > fine, and the NFS client can do the write without error. > > Thanks. I've found that when using hostnames in the sharenfs line, I had to use the FQDN; the short hostname did not work, even though both client and server were in the same DNS domain and that domain is in the search path, and nsswitch uses DNS for hosts (read: 'ping client1' works fine, as does 'mount server:/export/fs /mnt' from client1). Perhaps it's because I left the NFSv4 domain setting at the default. (I'm just using NFSv3, but trying to come up with an explanation. In any case, using the FQDN works.) -Andrew ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Intel SASUC8I - worth every penny
Dedhi Sujatmiko wrote: As a user of el-cheapo US$18 SIL3114, I managed to make the system freeze continuously when one of SATA cable got disconnected. I am using 8 disks RAIDZ2 driven by 2 x SIL3114 System is still able to answer the ping, but SSH and console are no longer responsive, obviously also the NFS and CIFS share. The console keep sending "waiting for disk" loop. The only way to recover is to reset the system, and as expected, one of the disk went offline, but the service is back online in degraded ZFS pool. The SIL3112/3114 were very early SATA controllers, indeed barely SATA controllers at all by todays standards as I think they always pretend to be PATA to the host system. -- Andrew ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] When to Scrub..... ZFS That Is
Thomas Burgess wrote: I scrub once a week. I think the general rule is: once a week for consumer grade drives once a month for enterprise grade drives. and before any planned operation which will reduce your redundancy/resilience, such as swapping out a disk for a new larger one when growing a pool. The resulting resilver will read all the data in the datasets in order to reconstruct the new disk, some of which might not have been read for ages (or since the last scrub), and that's not the ideal time to discover your existing copy of some blocks went bad some time back. Better to discover this before you reduce the pool redundancy/resilience, whilst it's still fixable. -- Andrew ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Proposition of a new zpool property.
Robert Milkowski wrote: To add my 0.2 cents... I think starting/stopping scrub belongs to cron, smf, etc. and not to zfs itself. However what would be nice to have is an ability to freeze/resume a scrub and also limit its rate of scrubbing. One of the reason is that when working in SAN environments one have to take into account more that just a server where a scrub will be running as while it might not impact the server it might cause an issue for others, etc. There's an RFE for this (pause/resume a scrub), or rather there was - unfortunately, it's got subsumed into another RFE/BUG and the pause/resume requirement got lost. I'll see about reinstating it. -- Andrew ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Can't import pool due to missing log device
Hullo All: I'm having a problem importing a ZFS pool. When I first built my fileserver I created two VDEVs and a log device as follows: raidz1-0 ONLINE c12t0d0 ONLINE c12t1d0 ONLINE c12t2d0 ONLINE c12t3d0 ONLINE raidz1-2 ONLINE c12t4d0 ONLINE c12t5d0 ONLINE c13t0d0 ONLINE c13t1d0 ONLINE logs /ZIL-Log.img And put them into a pool. The log that was a file I created on my OS drive for the ZIL (/ZIL-Log.img). I wanted to rebuild my server using Nexenta <http://www.nexenta.org/> so I exported the pool and tried to import it under Nexenta. I made sure to copy the ZIL-Log.img file to the new root partition of the Nexenta install so it would be there for the import. However, upon booting into the Nexenta install the pool shows as UNAVAIL and when trying to import I get this: status: One or more devices are missing from the system. action: The pool cannot be imported. Attach the missing devices and try again. see: http://www.sun.com/msg/ZFS-8000-6X I went back to my OpenSolaris install to see if I could import the pool in it's original environment but no such luck. It still shows as UNAVAIL and I can't get it to import. At this point I am about to try the instructions shown here (http://opensolaris.org/jive/thread.jspa?threadID=62831) but before I went down that road I thought I'd check with the mailing list to see if anyone has encountered this or something similar before. Thanks in advance for any suggestions. Andrew Kener ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can't import pool due to missing log device
3 - community edition Andrew On Apr 18, 2010, at 11:15 PM, Richard Elling wrote: > Nexenta version 2 or 3? > -- richard > > On Apr 18, 2010, at 7:13 PM, Andrew Kener wrote: > >> Hullo All: >> >> I'm having a problem importing a ZFS pool. When I first built my fileserver >> I created two VDEVs and a log device as follows: >> >> raidz1-0 ONLINE >> c12t0d0 ONLINE >> c12t1d0 ONLINE >> c12t2d0 ONLINE >> c12t3d0 ONLINE >> raidz1-2 ONLINE >> c12t4d0 ONLINE >> c12t5d0 ONLINE >> c13t0d0 ONLINE >> c13t1d0 ONLINE >> logs >> /ZIL-Log.img >> >> And put them into a pool. The log that was a file I created on my OS drive >> for the ZIL (/ZIL-Log.img). >> >> I wanted to rebuild my server using Nexenta so I exported the pool and tried >> to import it under Nexenta. I made sure to copy the ZIL-Log.img file to the >> new root partition of the Nexenta install so it would be there for the >> import. However, upon booting into the Nexenta install the pool shows as >> UNAVAIL and when trying to import I get this: >> >> status: One or more devices are missing from the system. >> action: The pool cannot be imported. Attach the missing >>devices and try again. >> see: http://www.sun.com/msg/ZFS-8000-6X >> >> I went back to my OpenSolaris install to see if I could import the pool in >> it's original environment but no such luck. It still shows as UNAVAIL and I >> can't get it to import. >> >> At this point I am about to try the instructions shown here >> (http://opensolaris.org/jive/thread.jspa?threadID=62831) but before I went >> down that road I thought I'd check with the mailing list to see if anyone >> has encountered this or something similar before. Thanks in advance for any >> suggestions. >> >> Andrew Kener >> ___ >> zfs-discuss mailing list >> zfs-discuss@opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > ZFS storage and performance consulting at http://www.RichardElling.com > ZFS training on deduplication, NexentaStor, and NAS performance > Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com > > > > > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mac OS X clients with ZFS server
The correct URL is: http://code.google.com/p/maczfs/ -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Rich Teer Sent: Sunday, April 25, 2010 7:11 PM To: Alex Blewitt Cc: ZFS discuss Subject: Re: [zfs-discuss] Mac OS X clients with ZFS server On Fri, 23 Apr 2010, Alex Blewitt wrote: > > > For your information, the ZFS project lives (well, limps really) on > > > at http://code.google.com/p/mac-zfs. You can get ZFS for Snow Leopard > > > from there and we're working on moving forwards from the ancient pool > > > support to something more recent. I've relatively recently merged in > > > the onnv-gate repository (at build 72) which should make things easier > > > to track in the future. > > > > That's good to hear! I thought Apple yanking ZFS support from Mac OS was > > a really dumb idea. Do you work for Apple? > > No, the entire effort is community based. Please feel free to join up to the > mailing list from the project page if you're interested in ZFS on Mac OSX. I tried going to that URL, but got a 404 error... :-( What's the correct one, please? -- Rich Teer, Publisher Vinylphile Magazine www.vinylphilemag.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] How to clear invisible, partially received snapshots?
I currently use zfs send/recv for onsite backups [1], and am configuring it for replication to an offsite server as well. I did an initial full send, and then a series of incrementals to bring the offsite pool up to date. During one of these transfers, the offsite server hung, and I had to power-cycle it. It came back up just fine, except that the snapshot it was receiving when it hung appeared to be both present and nonexistent, depending on which command was run. 'zfs recv' complained that the target snapshot already existed, but it did not show up in the output of 'zfs list', and 'zfs destroy' said it did not exist. I ran a scrub, which did not find any errors; nor did it solve the problem. I discovered some useful commands with zdb [2], and found more info: zdb -d showed the snapshot, with an unusual name: Dataset backup/ims/%zfs-auto-snap_daily-2010-04-22-1900 [ZPL], ID 6325, cr_txg 28137403, 2.62T, 123234 objects As opposed to a normal snapshot: Dataset backup/i...@zfs-auto-snap_daily-2010-04-21-1900 [ZPL], ID 5132, cr_txg 27472350, 2.61T, 123200 objects I then attempted 'zfs destroy backup/ims/%zfs-auto-snap_daily-2010-04-22-1900', but it still said the dataset did not exist. Finally I exported the pool, and after importing it, the snapshot was gone, and I could receive the snapshot normally. Is there a way to clear a "partial" snapshot without an export/import cycle? Thanks, Andrew [1] http://mail.opensolaris.org/pipermail/zfs-discuss/2009-December/034554.html [2] http://www.cuddletech.com/blog/pivot/entry.php?id=980 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Does Opensolaris support thin reclamation?
Support for thin reclamation depends on the SCSI "WRITE SAME" command; see this draft of a document from T10: http://www.t10.org/ftp/t10/document.05/05-270r0.pdf. I spent some time searching the source code for support for "WRITE SAME", but I wasn't able to find much. I assume that if it was supported, it would be listed in this header file: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/sys/scsi/generic/commands.h Does anyone know for certain whether Opensolaris supports thin reclamation on thinly-provisioned LUNs? If not, is anyone interested in or actively working on this? I'm especially interested in ZFS' support for thin reclamation, but I would be interested in hearing about support (or lack of) for UFS and SVM as well. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs question
Mihai wrote: hello all, I have the following scenario of using zfs. - I have a HDD images that has a NTFS partition stored in a zfs dataset in a file called images.img - I have X physical machines that boot from my server via iSCSI from such an image - Every time a machine ask for a boot request from my server a clone of the zfs dataset is created and the machine is given the clone to boot from I want to make an optimization to my framework that involves using a ramdisk pool to store the initial hdd images and the clones of the image being stored on a disk based pool. I tried to do this using zfs, but it wouldn't let me do cross pool clones. If someone has any idea on how to proceed in doing this, please let me know. It is not necessary to do this exactly as I proposed, but it has to be something in this direction, a ramdisk backed initial image and more disk backed clones. You haven't said what your requirement is - i.e. what are you hoping to improve by making this change? I can only guess. If you are reading blocks from your initial hdd images (golden images) frequently enough, and you have enough memory on your system, these blocks will end up on the ARC (memory) anyway. If you don't have enough RAM for this to help, then you could add more memory, and/or an SSD as a L2ARC device ("cache" device in zpool command line terms). -- Andrew Gabriel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [ZIL device brainstorm] intel x25-M G2 has ram cache?
Erik Trimble wrote: Frankly, I'm really surprised that there's no solution, given that the *amount* of NVRAM needed for ZIL (or similar usage) is really quite small. a dozen GB is more than sufficient, and really, most systems do fine with just a couple of GB (3-4 or so). Producing a small, DRAM-based device in a 3.5" HD form-factor with built-in battery shouldn't be hard, and I'm kinda flabberghasted nobody is doing it. Well, at least in the sub-$1000 category. I mean, it's 2 SODIMMs, a AAA-NiCad battery, a PCI-E->DDR2 memory controller, a PCI-E to SATA6Gbps controller, and that's it. It's a bit of a wonky design. The DRAM could do something of the order 1,000,000 IOPS, and is then throttled back to a tiny fraction of that by the SATA bottleneck. Disk interfaces like SATA/SAS really weren't designed for this type of use. What you probably want is a motherboard which has a small area of main memory protected by battery, and a ramdisk driver which knows how to use it. Then you'd get the 1,000,000 IOPS. No idea if anyone makes such a thing. You are correct that ZFS gets an enormous benefit from even tiny amounts if NV ZIL. Trouble is that no other operating systems or filesystems work this well with such relatively tiny amounts of NV storage, so such a hardware solution is very ZFS-specific. -- Andrew Gabriel | Solaris Systems Architect Email: andrew.gabr...@oracle.com Mobile: +44 7720 598213 Oracle Pre-Sales Guillemont Park | Minley Road | Camberley | GU17 9QG | United Kingdom ORACLE Corporation UK Ltd is a company incorporated in England & Wales | Company Reg. No. 1782505 | Reg. office: Oracle Parkway, Thames Valley Park, Reading RG6 1RA Oracle is committed to developing practices and products that help protect the environment ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS no longer working with FC devices.
I had a similar problem with a RAID shelf (switched to JBOD mode, with each physical disk presented as a LUN) connected via FC (qlc driver, but no MPIO). Running a scrub would eventually generate I/O errors and many messages like this: Sep 6 15:12:53 imsfs scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci10de,5...@e/pci1077,1...@0/f...@0,0/d...@w2100 0004d960cdec,e (sd4): Sep 6 15:12:53 imsfs Request Sense couldn't get sense data and eventually one or more disks would get marked as faulted by ZFS. This was under s10u6 (10/08, I think) but I imagine it still holds for u8. I did not have these problems with just one or two LUNs presented from the array, but I prefer to run ZFS in the recommended configuration where it manages the disks. My storage vendor (3rd-party, not Sun) recommended that in /etc/system I add 'set ssd:ssd_max_throttle = 23' or less and 'set ssd:ssd_io_time = 0x60' or 0x78. The default 0x20 (in what version of Solaris?) is apparently not enough in many cases. In my case (x64) I discovered I needed sd:sd_max_throttle, etc. (not ssd, which is apparently only for sparc), and that the default sd_io_time on recent Solaris 10 already is 0x60. Apparently the general rule for max_throttle is 256/# of LUNs, but my vendor found that 23 was the maximum reliable setting for 16 LUNs. This may or may not help you but it's something to try. Without the max_throttle setting, I would get errors somewhere between 30 minutes and 4 hours into a scrub, and with it scrubs run successfully. -Andrew >>> Demian Phillips 5/23/2010 8:01 AM >>> On Sat, May 22, 2010 at 11:33 AM, Bob Friesenhahn wrote: > On Fri, 21 May 2010, Demian Phillips wrote: > >> For years I have been running a zpool using a Fibre Channel array with >> no problems. I would scrub every so often and dump huge amounts of >> data (tens or hundreds of GB) around and it never had a problem >> outside of one confirmed (by the array) disk failure. >> >> I upgraded to sol10x86 05/09 last year and since then I have >> discovered any sufficiently high I/O from ZFS starts causing timeouts >> and off-lining disks. This leads to failure (once rebooted and cleaned >> all is well) long term because you can no longer scrub reliably. > > The problem could be with the device driver, your FC card, or the array > itself. In my case, issues I thought were to blame on my motherboard or > Solaris were due to a defective FC card and replacing the card resolved the > problem. > > If the problem is that your storage array is becoming overloaded with > requests, then try adding this to your /etc/system file: > > * Set device I/O maximum concurrency > * > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Device_I.2FO_Queue_Size_.28I.2FO_Concurrency.29 > set zfs:zfs_vdev_max_pending = 5 > > Bob > -- > Bob Friesenhahn > bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ > I've gone back to Solaris 10 11/06. It's working fine, but I notice some differences in performance that are I think key to the problem. With the latest Solaris 10 (u8) throughput according to zpool iostat was hitting about 115MB/sec sometimes a little higher. With 11/06 it maxes out at 40MB/sec. Both setups are using mpio devices as far as I can tell. Next is to go back to u8 and see if the tuning you suggested will help. It really looks to me that the OS is asking too much of the FC chain I have. The really puzzling thing is I just got told about a brand new Dell Solaris x86 production box using current and supported FC devices and a supported SAN get the same kind of problems when a scrub is run. I'm going to investigate that and see if we can get a fix from Oracle as that does have a support contract. It may shed some light on the issue I am seeing on the older hardware. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] unsetting the bootfs property possible? imported a FreeBSD pool
Reshekel Shedwitz wrote: r...@nexenta:~# zpool set bootfs= tank cannot set property for 'tank': property 'bootfs' not supported on EFI labeled devices r...@nexenta:~# zpool get bootfs tank NAME PROPERTY VALUE SOURCE tank bootfstanklocal Could this be related to the way FreeBSD's zfs partitioned my disk? I thought ZFS used EFI by default though (except for boot pools). Looks like this bit of code to me: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libzfs/common/libzfs_pool.c#473 473 /* 474 * bootfs property cannot be set on a disk which has 475 * been EFI labeled. 476 */ 477 if (pool_uses_efi(nvroot)) { 478 zfs_error_aux(hdl, dgettext(TEXT_DOMAIN, 479 "property '%s' not supported on " 480 "EFI labeled devices"), propname); 481 (void) zfs_error(hdl, EZFS_POOL_NOTSUP, errbuf); 482 zpool_close(zhp); 483 goto error; 484 } 485 zpool_close(zhp); 486 break; It's not checking if you're clearing the property before bailing out with the error about setting it. A few lines above, another test (for a valid bootfs name) does get bypassed in the case of clearing the property. Don't know if that alone would fix it. -- Andrew Gabriel | Solaris Systems Architect Email: andrew.gabr...@oracle.com Mobile: +44 7720 598213 Oracle Pre-Sales Guillemont Park | Minley Road | Camberley | GU17 9QG | United Kingdom ORACLE Corporation UK Ltd is a company incorporated in England & Wales | Company Reg. No. 1782505 | Reg. office: Oracle Parkway, Thames Valley Park, Reading RG6 1RA Oracle is committed to developing practices and products that help protect the environment ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is it possible to disable MPxIO during OpenSolaris installation?
James C. McPherson wrote: On 2/06/10 03:11 PM, Fred Liu wrote: Fix some typos. # In fact, there is no problem for MPxIO name in technology. It only matters for storage admins to remember the name. You are correct. I think there is no way to give short aliases to these long tedious MPxIO name. You are correct that we don't have aliases. However, I do not agree that the naming is tedious. It gives you certainty about the actual device that you are dealing with, without having to worry about whether you've cabled it right. Might want to add a call record to CR 6901193 Need a command to list current usage of disks, partitions, and slices which includes a request for vanity naming for disks. (Actually, vanity naming for disks should probably be brought out into a separate RFE.) -- Andrew Gabriel | Solaris Systems Architect Email: andrew.gabr...@oracle.com Mobile: +44 7720 598213 Oracle Pre-Sales Guillemont Park | Minley Road | Camberley | GU17 9QG | United Kingdom ORACLE Corporation UK Ltd is a company incorporated in England & Wales | Company Reg. No. 1782505 | Reg. office: Oracle Parkway, Thames Valley Park, Reading RG6 1RA Oracle is committed to developing practices and products that help protect the environment ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ls says: /tank/ws/fubar: Operation not applicable
Gordon Ross wrote: Anyone know why my ZFS filesystem might suddenly start giving me an error when I try to "ls -d" the top of it? i.e.: ls -d /tank/ws/fubar /tank/ws/fubar: Operation not applicable zpool status says all is well. I've tried snv_139 and snv_137 (my latest and previous installs). It's an amd64 box. Both OS versions show the same problem. Do I need to run a scrub? (will take days...) Other ideas? It might be interesting to run it under truss, to see which syscall is returning that error. -- Andrew Gabriel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)
Now at 36 hours since zdb process start and: PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 827 root 4936M 4931M sleep 590 0:50:47 0.2% zdb/209 Idling at 0.2% processor for nearly the past 24 hours... feels very stuck. Thoughts on how to determine where and why? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)
Update: have given up on the zdb write mode repair effort, as least for now. Hoping for any guidance / direction anyone's willing to offer... Re-running 'zpool import -F -f tank' with some stack trace debug, as suggested in similar threads elsewhere. Note that this appears hung at near idle. ff03e278c520 ff03e9c60038 ff03ef109490 1 60 ff0530db4680 PC: _resume_from_idle+0xf1CMD: zpool import -F -f tank stack pointer for thread ff03e278c520: ff00182bbff0 [ ff00182bbff0 _resume_from_idle+0xf1() ] swtch+0x145() cv_wait+0x61() zio_wait+0x5d() dbuf_read+0x1e8() dnode_next_offset_level+0x129() dnode_next_offset+0xa2() get_next_chunk+0xa5() dmu_free_long_range_impl+0x9e() dmu_free_object+0xe6() dsl_dataset_destroy+0x122() dsl_destroy_inconsistent+0x5f() findfunc+0x23() dmu_objset_find_spa+0x38c() dmu_objset_find_spa+0x153() dmu_objset_find+0x40() spa_load_impl+0xb23() spa_load+0x117() spa_load_best+0x78() spa_import+0xee() zfs_ioc_pool_import+0xc0() zfsdev_ioctl+0x177() cdev_ioctl+0x45() spec_ioctl+0x5a() fop_ioctl+0x7b() ioctl+0x18e() dtrace_systrace_syscall32+0x11a() _sys_sysenter_post_swapgs+0x149() -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)
Dedup had been turned on in the past for some of the volumes, but I had turned it off altogether before entering production due to performance issues. GZIP compression was turned on for the volume I was trying to delete. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)
Malachi, Thanks for the reply. There were no snapshots for the CSV1 volume that I recall... very few snapshots on the any volume in the tank. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)
Just re-ran 'zdb -e tank' to confirm the CSV1 volume is still exhibiting error 16: Could not open tank/CSV1, error 16 Considering my attempt to delete the CSV1 volume lead to the failure in the first place, I have to think that if I can either 1) complete the deletion of this volume or 2) roll back to a transaction prior to this based on logging or 3) repair whatever corruption has been caused by this partial deletion, that I will then be able to import the pool. What does 'error 16' mean in the ZDB output, any suggestions? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)
Thanks Victor. I will give it another 24 hrs or so and will let you know how it goes... You are right, a large 2TB volume (CSV1) was not in the process of being deleted, as described above. It is showing error 16 on 'zdb -e' -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)
Victor, The 'zpool import -f -F tank' failed at some point last night. The box was completely hung this morning; no core dump, no ability to SSH into the box to diagnose the problem. I had no choice but to reset, as I had no diagnostic ability. I don't know if there would be anything in the logs? Earlier I ran 'zdb -e -bcsvL tank' in write mode for 36 hours and gave up to try something different. Now the zpool import has hung the box. Should I try zdb again? Any suggestions? Thanks, Andrew -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)
> > On Jun 29, 2010, at 8:30 PM, Andrew Jones wrote: > > > Victor, > > > > The 'zpool import -f -F tank' failed at some point > last night. The box was completely hung this morning; > no core dump, no ability to SSH into the box to > diagnose the problem. I had no choice but to reset, > as I had no diagnostic ability. I don't know if there > would be anything in the logs? > > It sounds like it might run out of memory. Is it an > option for you to add more memory to the box > temporarily? I'll place the order for more memory or transfer some from another machine. Seems quite likely that we did run out of memory. > > Even if it is an option, it is good to prepare for > such outcome and have kmdb loaded either at boot time > by adding -k to 'kernel$' line in GRUB menu, or by > loading it from console with 'mdb -K' before > attempting import (type ':c' at mdb prompt to > continue). In case it hangs again, you can press > 'F1-A' on the keyboard, drop into kmdb and then use > '$ > If you hardware has physical or virtual NMI button, > you can use that too to drop into kmdb, but you'll > need to set a kernel variable for that to work: > > http://blogs.sun.com/darren/entry/sending_a_break_to_o > pensolaris > > > Earlier I ran 'zdb -e -bcsvL tank' in write mode > for 36 hours and gave up to try something different. > Now the zpool import has hung the box. > > What do you mean be running zdb in write mode? zdb > normally is readonly tool. Did you change it in some > way? I had read elsewhere that set /zfs/:zfs_recover=/1/ and set aok=/1/ placed zdb into some kind of a write/recovery mode. I have set these in /etc/system. Is this a bad idea in this case? > > > Should I try zdb again? Any suggestions? > > It sounds like zdb is not going to be helpful, as > inconsistent dataset processing happens only in > read-write mode. So you need to try above suggestions > with more memory and kmdb/nmi. Will do, thanks! > > victor > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ss > -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)
Victor, I've reproduced the crash and have vmdump.0 and dump device files. How do I query the stack on crash for your analysis? What other analysis should I provide? Thanks -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)
Victor, A little more info on the crash, from the messages file is attached here. I have also decompressed the dump with savecore to generate unix.0, vmcore.0, and vmdump.0. Jun 30 19:39:10 HL-SAN unix: [ID 836849 kern.notice] Jun 30 19:39:10 HL-SAN ^Mpanic[cpu3]/thread=ff0017909c60: Jun 30 19:39:10 HL-SAN genunix: [ID 335743 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=ff0017909790 addr=0 occurred in module "" due to a NULL pointer dereference Jun 30 19:39:10 HL-SAN unix: [ID 10 kern.notice] Jun 30 19:39:10 HL-SAN unix: [ID 839527 kern.notice] sched: Jun 30 19:39:10 HL-SAN unix: [ID 753105 kern.notice] #pf Page fault Jun 30 19:39:10 HL-SAN unix: [ID 532287 kern.notice] Bad kernel fault at addr=0x0 Jun 30 19:39:10 HL-SAN unix: [ID 243837 kern.notice] pid=0, pc=0x0, sp=0xff0017909880, eflags=0x10002 Jun 30 19:39:10 HL-SAN unix: [ID 211416 kern.notice] cr0: 8005003b cr4: 6f8 Jun 30 19:39:10 HL-SAN unix: [ID 624947 kern.notice] cr2: 0 Jun 30 19:39:10 HL-SAN unix: [ID 625075 kern.notice] cr3: 336a71000 Jun 30 19:39:10 HL-SAN unix: [ID 625715 kern.notice] cr8: c Jun 30 19:39:10 HL-SAN unix: [ID 10 kern.notice] Jun 30 19:39:10 HL-SAN unix: [ID 592667 kern.notice]rdi: 282 rsi:15809 rdx: ff03edb1e538 Jun 30 19:39:10 HL-SAN unix: [ID 592667 kern.notice]rcx:5 r8:0 r9: ff03eb2d6a00 Jun 30 19:39:10 HL-SAN unix: [ID 592667 kern.notice]rax: 202 rbx:0 rbp: ff0017909880 Jun 30 19:39:10 HL-SAN unix: [ID 592667 kern.notice]r10: f80d16d0 r11:4 r12:0 Jun 30 19:39:10 HL-SAN unix: [ID 592667 kern.notice]r13: ff03e21bca40 r14: ff03e1a0d7e8 r15: ff03e21bcb58 Jun 30 19:39:10 HL-SAN unix: [ID 592667 kern.notice]fsb:0 gsb: ff03e25fa580 ds: 4b Jun 30 19:39:10 HL-SAN unix: [ID 592667 kern.notice] es: 4b fs:0 gs: 1c3 Jun 30 19:39:10 HL-SAN unix: [ID 592667 kern.notice]trp:e err: 10 rip:0 Jun 30 19:39:10 HL-SAN unix: [ID 592667 kern.notice] cs: 30 rfl:10002 rsp: ff0017909880 Jun 30 19:39:10 HL-SAN unix: [ID 266532 kern.notice] ss: 38 Jun 30 19:39:10 HL-SAN unix: [ID 10 kern.notice] Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff0017909670 unix:die+dd () Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff0017909780 unix:trap+177b () Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff0017909790 unix:cmntrap+e6 () Jun 30 19:39:10 HL-SAN genunix: [ID 802836 kern.notice] ff0017909880 0 () Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff00179098a0 unix:debug_enter+38 () Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff00179098c0 unix:abort_sequence_enter+35 () Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff0017909910 kbtrans:kbtrans_streams_key+102 () Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff0017909940 conskbd:conskbdlrput+e7 () Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff00179099b0 unix:putnext+21e () Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff00179099f0 kbtrans:kbtrans_queueevent+7c () Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff0017909a20 kbtrans:kbtrans_queuepress+7c () Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff0017909a60 kbtrans:kbtrans_untrans_keypressed_raw+46 () Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff0017909a90 kbtrans:kbtrans_processkey+32 () Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff0017909ae0 kbtrans:kbtrans_streams_key+175 () Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff0017909b10 kb8042:kb8042_process_key+40 () Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff0017909b50 kb8042:kb8042_received_byte+109 () Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff0017909b80 kb8042:kb8042_intr+6a () Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff0017909bb0 i8042:i8042_intr+c5 () Jun 30 19:39:11 HL-SAN genunix: [ID 655072 kern.notice] ff0017909c00 unix:av_dispatch_autovect+7c () Jun 30 19:39:11 HL-SAN genunix: [ID 655072 kern.notice] ff0017909c40 unix:dispatch_hardint+33 () Jun 30 19:39:11 HL-SAN genunix: [ID 655072 kern.notice] ff00183552f0 unix:switch_sp_and_call+13 () Jun 30 19:39:11 HL-SAN genunix: [ID 655072 kern.notice] ff0018355340 unix:do_interrupt+b8 () Jun 30 19:39:11 HL-SAN genunix: [ID 655072 kern.notice] ff0018355350 unix:_interrupt+b8 () Jun 30 19:39:11 HL-SAN genunix: [ID 655072 kern.notice] ff00183554a0 unix:htable_steal+198 () Jun 30 19:39:11 HL-SAN genunix: [ID 655072 kern.notice] ff0018355510 unix:htable_alloc+248 () Jun 30 19:39:11 HL-SAN genunix: [ID 655072 kern.notice] ff00183555c0
Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)
> Andrew, > > Looks like the zpool is telling you the devices are > still doing work of > some kind, or that there are locks still held. > Agreed; it appears the CSV1 volume is in a fundamentally inconsistent state following the aborted zfs destroy attempt. See later in this thread where Victor has identified this to be the case. I am awaiting his analysis of the latest crash. > From man of section 2 intro page the errors are > listed. Number 16 > ooks to be an EBUSY. > > > 16 EBUSYDevice busy > An attempt was made to mount > a dev- > ice that was already > mounted or an > attempt was made to > unmount a device > on which there is > an active file > (open file, current > directory, > mounted-on file, active > text seg- > ment). It will also > occur if an > attempt is made to > enable accounting > when it is already > enabled. The > device or resource is > currently una- > vailable. EBUSY is > also used by > mutexes, semaphores, > condition vari- > ables, and r/w locks, > to indicate > that a lock is held, > and by the > processor control > function > P_ONLINE. > ndrew Jones wrote: > > Just re-ran 'zdb -e tank' to confirm the CSV1 > volume is still exhibiting error 16: > > > > > > Could not open tank/CSV1, error 16 > > > > > > Considering my attempt to delete the CSV1 volume > lead to the failure in the first place, I have to > think that if I can either 1) complete the deletion > of this volume or 2) roll back to a transaction prior > to this based on logging or 3) repair whatever > corruption has been caused by this partial deletion, > that I will then be able to import the pool. > > > > What does 'error 16' mean in the ZDB output, any > suggestions? > > > > -- > Geoff Shipman | Senior Technical Support Engineer > Phone: +13034644710 > Oracle Global Customer Services > 500 Eldorado Blvd. UBRM-04 | Broomfield, CO 80021 > Email: geoff.ship...@sun.com | Hours:9am-5pm > MT,Monday-Friday > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ss > -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)
Victor, The zpool import succeeded on the next attempt following the crash that I reported to you by private e-mail! For completeness, this is the final status of the pool: pool: tank state: ONLINE scan: resilvered 1.50K in 165h28m with 0 errors on Sat Jul 3 08:02:30 2010 config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 c0t0d0 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c0t6d0 ONLINE 0 0 0 c0t7d0 ONLINE 0 0 0 cache c2t0d0ONLINE 0 0 0 errors: No known data errors Thank you very much for your help. We did not need to add additional RAM to solve this, in the end. Instead, we needed to persist with the import through several panics to finally work our way through the large inconsistent dataset; it is unclear whether the resilvering caused additional processing delay. Unfortunately, the delay made much of the data quite stale, now that it's been recovered. It does seem that zfs would benefit tremendously from a better (quicker and more intuitive?) set of recovery tools, that are available to a wider range of users. It's really a shame, because the features and functionality in zfs are otherwise absolutely second to none. /Andrew[i][/i][i][/i][i][/i][i][/i][i][/i] -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)
> > - Original Message - > > Victor, > > > > The zpool import succeeded on the next attempt > following the crash > > that I reported to you by private e-mail! > > > > For completeness, this is the final status of the > pool: > > > > > > pool: tank > > state: ONLINE > > scan: resilvered 1.50K in 165h28m with 0 errors on > Sat Jul 3 08:02:30 > > Out of curiosity, what sort of drives are you using > here? Resilvering in 165h28m is close to a week, > which is rather bad imho. I think the resilvering statistic is quite misleading, in this case. We're using very average 1TB retail Hitachi disks, which perform just fine when the pool is healthy. What happened here is that the zpool-tank process was performing a resilvering task in parallel with the processing of a very large inconsistent dataset, which took the overwhelming majority of the time to complete. Why it actually took over a week to process the 2TB volume in an inconsistent state is my primary concern with the performance of ZFS, in this case. > > Vennlige hilsener / Best regards > > roy > -- > Roy Sigurd Karlsbakk > (+47) 97542685 > r...@karlsbakk.net > http://blogg.karlsbakk.net/ > -- > I all pedagogikk er det essensielt at pensum > presenteres intelligibelt. Det er et elementært > imperativ for alle pedagoger å unngå eksessiv > anvendelse av idiomer med fremmed opprinnelse. I de > fleste tilfeller eksisterer adekvate og relevante > synonymer på norsk. > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ss -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)
> > Good. Run 'zpool scrub' to make sure there are no > other errors. > > regards > victor > Yes, scrubbed successfully with no errors. Thanks again for all of your generous assistance. /AJ -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Lost ZIL Device
Hello All, I've recently run into an issue I can't seem to resolve. I have been running a zpool populated with two RAID-Z1 VDEVs and a file on the (separate) OS drive for the ZIL: raidz1-0 ONLINE c12t0d0 ONLINE c12t1d0 ONLINE c12t2d0 ONLINE c12t3d0 ONLINE raidz1-2 ONLINE c12t4d0 ONLINE c12t5d0 ONLINE c13t0d0 ONLINE c13t1d0 ONLINE logs /ZIL-Log.img This was running on Nexenta Community Edition v3. Everything was going smoothly until today when the OS hard drive crashed and I was not able to boot from it any longer. I had migrated this setup from an OpenSolaris install some months back and I still had the old drive intact. I put it in the system, booted it up and tried to import the zpool. Unfortunately, I have not been successful. Previously when migrating from OSOL to Nexenta I was able to get the new system to recognize and import the ZIL device file. Since it has been lost in the drive crash I have not been able to duplicate that success. Here is the output from a 'zpool import' command: pool: tank id: 9013303135438223804 state: UNAVAIL status: The pool was last accessed by another system. action: The pool cannot be imported due to damaged devices or data. see: http://www.sun.com/msg/ZFS-8000-EY config: tank UNAVAIL missing device raidz1-0 ONLINE c12t0d0 ONLINE c12t1d0 ONLINE c12t5d0 ONLINE c12t3d0 ONLINE raidz1-2 ONLINE c12t4d0 ONLINE c12t2d0 ONLINE c13t0d0 ONLINE c13t1d0 ONLINE I created a new file for the ZIL (using mkfile) and tried to specify it for inclusion with -d but it doesn't get recognized. Probably because it was never part of the original zpool. I also symlinked the new ZIL file into /dev/dsk but that didn't make any difference either. Any suggestions? Andrew Kener ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Lost ZIL Device
According to 'zpool upgrade' my pool versions are are 22. All pools were upgraded several months ago, including the one in question. Here is what I get when I try to import: fileserver ~ # zpool import 9013303135438223804 cannot import 'tank': pool may be in use from other system, it was last accessed by fileserver (hostid: 0x406155) on Tue Jul 6 10:46:13 2010 use '-f' to import anyway fileserver ~ # zpool import -f 9013303135438223804 cannot import 'tank': one or more devices is currently unavailable Destroy and re-create the pool from a backup source. On Jul 6, 2010, at 11:48 PM, Edward Ned Harvey wrote: >> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- >> boun...@opensolaris.org] On Behalf Of Andrew Kener >> >> the OS hard drive crashed [and log device] > > Here's what I know: In zpool >= 19, if you import this, it will prompt you > to confirm the loss of the log device, and then it will import. > > Here's what I have heard: The ability to import with a failed log device as > described above, was created right around zpool 14 or 15, not quite sure > which. > > Here's what I don't know: If the failed zpool was some version which was > too low ... and you try to import on an OS which is capable of a much higher > version of zpool ... Can the newer OS handle it just because the newer OS is > able to handle a newer version of zpool? Or maybe the version of the failed > pool is the one that matters, regardless of what the new OS is capable of > doing now? > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Lost ZIL Device - FIXED
Greetings All, I can't believe it didn't figure this out sooner. First of all, a big thank you to everyone who gave me advice and suggestions, especially Richard. The problem was with the -d switch. When importing a pool if you specify -d and a path it ONLY looks there. So if I run: # zpool import -d /var/zfs-log/ tank It won't look for devices in /dev/dsk Consequently running without -d /var/zfs-log/ it won't find the log device. Here is the command that worked: # zpool import -d /var/zfs-log -d /dev/dsk tank And to make sure that this doesn't happen again (I have learned my lesson this time) I have ordered two small SSD drives to put in a mirrored config for the log device. Thanks again to everyone and now I will get some worry-free sleep :) Andrew ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Legality and the future of zfs...
Linder, Doug wrote: Out of sheer curiosity - and I'm not disagreeing with you, just wondering - how does ZFS make money for Oracle when they don't charge for it? Do you think it's such an important feature that it's a big factor in customers picking Solaris over other platforms? Yes, it is one of many significant factors in customers choosing Solaris over other OS's. Having chosen Solaris, customers then tend to buy Sun/Oracle systems to run it on. Of course, there are the 7000 series products too, which are heavily based on the capabilities of ZFS, amongst other Solaris features. -- Andrew Gabriel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recommended RAM for ZFS on various platforms
Garrett D'Amore wrote: Btw, instead of RAIDZ2, I'd recommend simply using stripe of mirrors. You'll have better performance, and good resilience against errors. And you can grow later as you need to by just adding additional drive pairs. -- Garrett Or in my case, I find my home data growth is slightly less than the rate of disk capacity increase, so every 18 months or so, I simply swap out the disks for higher capacity ones. -- Andrew Gabriel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 1tb SATA drives
Arne Jansen wrote: Jordan McQuown wrote: I’m curious to know what other people are running for HD’s in white box systems? I’m currently looking at Seagate Barracuda’s and Hitachi Deskstars. I’m looking at the 1tb models. These will be attached to an LSI expander in a sc847e2 chassis driven by an LSI 9211-8i HBA. This system will be used as a large storage array for backups and archiving. I wouldn't recommend using desktop drives in a server RAID. They can't handle the vibrations well that are present in a server. I'd recommend at least the Seagate Constellation or the Hitachi Ultrastar, though I haven't tested the Deskstar myself. I've been using a couple of 1TB Hitachi Ultrastars for about a year with no problem. I don't think mine are still available, but I expect they have something equivalent. The pool is scrubbed 3 times a week which takes nearly 19 hours now, and hammers the heads quite hard. I keep meaning to reduce the scrub frequency now it's getting to take so long, but haven't got around to it. What I really want is pause/resume scrub, and the ability to trigger the pause/resume from the screensaver (or something similar). -- Andrew Gabriel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send to remote any ideas for a faster way than ssh?
Richard Jahnel wrote: I've tried ssh blowfish and scp arcfour. both are CPU limited long before the 10g link is. I'vw also tried mbuffer, but I get broken pipe errors part way through the transfer. Any idea why? Does the zfs send or zfs receive bomb out part way through? Might be worth trying it over rsh if security isn't an issue, and then you lose the encryption overhead. Trouble is that then you've got almost no buffering, which can do bad things to the performance, which is why mbuffer would be ideal if it worked for you. I'm open to ideas for faster ways to to either zfs send directly or through a compressed file of the zfs send output. For the moment I; zfs send > pigz scp arcfour the file gz file to the remote host gunzip < to zfs receive This takes a very long time for 3 TB of data, and barely makes use the 10g connection between the machines due to the CPU limiting on the scp and gunzip processes. Also, if you have multiple datasets to send, might be worth seeing if sending them in parallel helps. -- Andrew Gabriel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send to remote any ideas for a faster way than ssh?
Richard Jahnel wrote: Any idea why? Does the zfs send or zfs receive bomb out part way through? I have no idea why mbuffer fails. Changing the -s from 128 to 1536 made it take longer to occur and slowed it down bu about 20% but didn't resolve the issue. It just ment I might get as far as 2.5gb before mbuffer bombed with broken pipe. Trying -r and -R with various values had no effect. I found that where the network bandwidth and the disks' throughput are similar (which requires a pool with many top level vdevs in the case of a 10Gb link), you ideally want a buffer on the receive side which will hold about 5 seconds worth of data. A large buffer on the transmit side didn't help. The aim is to be able to continue steaming data across the network whilst a transaction commit happens at the receive end and zfs receive isn't reading, but to have the data ready locally for zfs receive when it starts reading again. Then the network will stream, in spite of the bursty read nature of zfs receive. I recorded this in bugid http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6729347 However, I haven't verified the extent to which this still happens on more recent builds. Might be worth trying it over rsh if security isn't an issue, and then you lose the encryption overhead. Trouble is that then you've got almost no buffering, which can do bad things to the performance, which is why mbuffer would be ideal if it worked for you. I seem to remember reading that rsh was remapped to ssh in Solaris. No. On the system you're rsh'ing to, you will have to "svcadm enable svc:/network/shell:default", and set up appropriate authorisation in ~/.rhosts I heard of some folks using netcat. I haven't figured out where to get netcat nor the syntax for using it yet. I used a buffering program of my own, but I presume mbuffer would work too. -- Andrew Gabriel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] NFS performance?
Thomas Burgess wrote: On Fri, Jul 23, 2010 at 3:11 AM, Sigbjorn Lie <mailto:sigbj...@nixtra.com>> wrote: Hi, I've been searching around on the Internet to fine some help with this, but have been unsuccessfull so far. I have some performance issues with my file server. I have an OpenSolaris server with a Pentium D 3GHz CPU, 4GB of memory, and a RAIDZ1 over 4 x Seagate (ST31500341AS) 1,5TB SATA drives. If I compile or even just unpack a tar.gz archive with source code (or any archive with lots of small files), on my Linux client onto a NFS mounted disk to the OpenSolaris server, it's extremely slow compared to unpacking this archive on the locally on the server. A 22MB .tar.gz file containng 7360 files takes 9 minutes and 12seconds to unpack over NFS. Unpacking the same file locally on the server is just under 2 seconds. Between the server and client I have a gigabit network, which at the time of testing had no other significant load. My NFS mount options are: "rw,hard,intr,nfsvers=3,tcp,sec=sys". Any suggestions to why this is? Regards, Sigbjorn as someone else said, adding an ssd log device can help hugely. I saw about a 500% nfs write increase by doing this. I've heard of people getting even more. Another option if you don't care quite so much about data security in the event of an unexpected system outage would be to use Robert Milkowski and Neil Perrin's zil synchronicity [PSARC/2010/108] changes with sync=disabled, when the changes work their way into an available build. The risk is that if the file server goes down unexpectedly, it might come back up having lost some seconds worth of changes which it told the client (lied) that it had committed to disk, when it hadn't, and this violates the NFS protocol. That might be OK if you are using it to hold source that's being built, where you can kick off a build again if the server did go down in the middle of it. Wouldn't be a good idea for some other applications though (although Linux ran this way for many years, seemingly without many complaints). Note that there's no increased risk of the zpool going bad - it's just that after the reboot, filesystems with sync=disabled will look like they were rewound by some seconds (possibly up to 30 seconds). -- Andrew Gabriel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] NFS performance?
Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Phil Harman Milkowski and Neil Perrin's zil synchronicity [PSARC/2010/108] changes with sync=disabled, when the changes work their way into an available The fact that people run unsafe systems seemingly without complaint for years assumes that they know silent data corruption when they see^H^H^Hhear it ... which, of course, they didn't ... because it is silent ... or having encountered corrupted data, that they have the faintest idea where it came from. In my day to day work I still find many people that have been (apparently) very lucky. Running with sync disabled, or ZIL disabled, you could call "unsafe" if you want to use a generalization and a stereotype. Just like people say "writeback" is unsafe. If you apply a little more intelligence, you'll know, it's safe in some conditions, and not in other conditions. Like ... If you have a BBU, you can use your writeback safely. And if you're not sharing stuff across the network, you're guaranteed the disabled ZIL is safe. But even when you are sharing stuff across the network, the disabled ZIL can still be safe under the following conditions: If you are only doing file sharing (NFS, CIFS) and you are willing to reboot/remount from all your clients after an ungraceful shutdown of your server, then it's safe to run with ZIL disabled. No, that's not safe. The client can still lose up to 30 seconds of data, which could be, for example, an email message which is received and foldered on the server, and is then lost. It's probably /*safe enough*/ for most home users, but you should be fully aware of the potential implications before embarking on this route. (As I said before, the zpool itself is not at any additional risk of corruption, it's just that you might find the zfs filesystems with sync=disabled appear to have been rewound by up to 30 seconds.) If you're unsure, then adding SSD nonvolatile log device, as people have said, is the way to go. -- Andrew Gabriel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zvol recordsize for backing a zpool over iSCSI
Just wondering if anyone has experimented with working out the best zvol recordsize for a zvol which is backing a zpool over iSCSI? -- Andrew Gabriel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Maximum zfs send/receive throughput
Jim Barker wrote: Just an update, I had a ticket open with Sun regarding this and it looks like they have a CR for what I was seeing (6975124). That would seem to describe a zfs receive which has stopped for 12 hours. You described yours as slow, which is not the term I personally would use for one which is stopped. However, you haven't given anything like enough detail here of your situation and what's happening for me to make any worthwhile guesses. -- Andrew Gabriel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS SCRUB
Mohammed Sadiq wrote: Hi Is it recommended to do scrub while the filesystem is mounted . How frequently do we have to do scrub and at what circumstances. You can scrub while the filesystems are mounted - most people do, there's no reason to unmount for for a scrub. (Scrub is pool level, not filesystem level.) Scrub does noticeably slow the filesystem, so pick a time of low application load or a time when performance isn't critical. If it overruns into a busy period, you can cancel the scrub. Unfortunately, you can't pause and resume - there's an RFE for this, so if you cancel one you can't restart it from where it got to - it has to restart from the beginning. You should scrub occasionally anyway. That's your check that data you haven't accessed in your application isn't rotting on the disks. You should also do a scrub before you do a planned reduction of the pool redundancy (e.g. if you're going to detach a mirror side in order to attach a larger disk), most particularly if you are reducing the redundancy to nothing. -- Andrew Gabriel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RAID Z stripes
Phil Harman wrote: On 10 Aug 2010, at 08:49, Ian Collins wrote: On 08/10/10 06:21 PM, Terry Hull wrote: I am wanting to build a server with 16 - 1TB drives with 2 – 8 drive RAID Z2 arrays striped together. However, I would like the capability of adding additional stripes of 2TB drives in the future. Will this be a problem? I thought I read it is best to keep the stripes the same width and was planning to do that, but I was wondering about using drives of different sizes. These drives would all be in a single pool. It would work, but you run the risk of the smaller drives becoming full and all new writes doing to the bigger vdev. So while usable, performance would suffer. Almost by definition, the 1TB drives are likely to be getting full when the new drives are added (presumably because of running out of space). Performance can only be said to suffer relative to a new pool built entirely with drives of the same size. Even if he added 8x 2TB drives in a RAIDZ3 config it is hard to predict what the performance gap will be (on the one hand: RAIDZ3 vs RAIDZ2, on the other: an empty group vs an almost full, presumably fragmented, group). One option would be to add 2TB drives as 5 drive raidz3 vdevs. That way your vdevs would be approximately the same size and you would have the optimum redundancy for the 2TB drives. I think you meant 6, but I don't see a good reason for matching the group sizes. I'm for RAIDZ3, but I don't see much logic in mixing groups of 6+2 x 1TB and 3+3 x 2TB in the same pool (in one group I appear to care most about maximising space, in the other I'm maximising availability) Another option - use the new 2TB drives to swap out the existing 1TB drives. If you can find another use for the swapped out drives, this works well, and avoids ending up with sprawling lower capacity drives as your pool grows in size. This is what I do at home. The freed-up drives get used in other systems and for off-site backups. Over the last 4 years, I've upgraded from 1/4TB, to 1/2TB, and now on 1TB drives. -- Andrew Gabriel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Global Spare for 2 pools
Tony MacDoodle wrote: I have 2 ZFS pools all using the same drive type and size. The question is can I have 1 global hot spare for both of those pools? Yes. A hot spare disk can be added to more than one pool at the same time. -- Andrew Gabriel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS automatic rollback and data rescue.
Constantine wrote: Hi. I've got the ZFS filesystem (opensolaris 2009.06), witch, as i can see, was automatically rollbacked by OS to the lastest snapshot after the power failure. ZFS doesn't do this. Can you give some more details of what you're seeing? Would also be useful to see output of: zfs list -t all -r zpool/filesystem There is a trouble - snapshot is too old, and ,consequently, there is a questions -- Can I browse pre-rollbacked corrupted branch of FS ? And, if I can, how ? -- Andrew Gabriel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS automatic rollback and data rescue.
Constantine wrote: ZFS doesn't do this. I thought so too. ;) Situation brief: I've got OpenSolaris 2009.06 installed on the RAID-5 array on the controller with 512 Mb cache (as i can remember) without a cache-saving battery. I hope the controller disabled the cache then. Probably a good idea to run "zpool scrub rpool" to find out if it's broken. It will probably take some time. zpool status will show the progress. At the Friday lightning bolt hit the power supply station of colocating company,and turned out that their UPSs not much more then decoration. After reboot filesystem and logs are on their last snapshot version. Would also be useful to see output of: zfs list -t all -r zpool/filesystem wi...@zeus:~/.zfs/snapshot# zfs list -t all -r rpool NAME USED AVAIL REFER MOUNTPOINT rpool 427G 1.37T 82.5K /rpool rpool/ROOT 366G 1.37T19K legacy rpool/ROOT/opensolaris20.6M 1.37T 3.21G / rpool/ROOT/xvm8.10M 1.37T 8.24G / rpool/ROOT/xvm-1 690K 1.37T 8.24G / rpool/ROOT/xvm-2 35.1G 1.37T 232G / rpool/ROOT/xvm-3 851K 1.37T 221G / rpool/ROOT/xvm-4 331G 1.37T 221G / rpool/ROOT/xv...@install 144M - 2.82G - rpool/ROOT/xv...@xvm 38.3M - 3.21G - rpool/ROOT/xv...@2009-07-27-01:09:1456K - 8.24G - rpool/ROOT/xv...@2009-07-27-01:09:5756K - 8.24G - rpool/ROOT/xv...@2009-09-13-23:34:54 2.30M - 206G - rpool/ROOT/xv...@2009-09-13-23:35:17 1.14M - 206G - rpool/ROOT/xv...@2009-09-13-23:42:12 5.72M - 206G - rpool/ROOT/xv...@2009-09-13-23:42:45 5.69M - 206G - rpool/ROOT/xv...@2009-09-13-23:46:25 573K - 206G - rpool/ROOT/xv...@2009-09-13-23:46:34 525K - 206G - rpool/ROOT/xv...@2009-09-13-23:48:11 6.51M - 206G - rpool/ROOT/xv...@2010-04-22-03:50:25 24.6M - 221G - rpool/ROOT/xv...@2010-04-22-03:51:28 24.6M - 221G - Actually, there's 24.6Mbytes worth of changes to the filesystem since the last snapshot, which is coincidentally about the same as there was over the preceding minute between the last two snapshots. I can't tell if (or how much of) that happened before, verses after, the reboot though. rpool/dump16.0G 1.37T 16.0G - rpool/export 28.6G 1.37T21K /export rpool/export/home 28.6G 1.37T21K /export/home rpool/export/home/wiron 28.6G 1.37T 28.6G /export/home/wiron rpool/swap16.0G 1.38T 101M - = Normally in a power-out scenario, you will only lose asynchronous writes since the last transaction group commit, which will be up to 30 seconds worth (although normally much less), and you lose no synchronous writes. However, I've no idea what your potentially flaky RAID array will have done. If it was using its cache and thinking it was non-volatile, then it could easily have corrupted the zfs filesystem due to having got writes out of sequence with transaction commits, and this can render the filesystem no longer mountable because the back-end storage has lied to zfs about committing writes. Even though you were lucky and it still mounts, it might still be corrupted, hence the suggestion to run zpool scrub (and even more important, get the RAID array fixed). Since I presume ZFS doesn't have redundant storage for this zpool, any corrupted data can't be repaired by ZFS, although it will tell you about it. Running ZFS without redundancy on flaky storage is not a good place to be. -- Andrew Gabriel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Unusual latency issues
Markus Kovero wrote: Hi, this may not be correct mailinglist for this, but I’d like to share this with you, I noticed weird network behavior with osol snv_123. icmp for host lags randomly between 500ms-5000ms and ssh sessions seem to tangle, I guess this could affect iscsi/nfs as well. what was most intresting that I found workaround to be running snoop with promiscuous mode disabled on interfaces suffering lag, this did make interruptions go away. Is this somekind cpu/irq scheduling issue? Behaviour was noticed on two different platform and with two different nics (bge and e1000). Unless you have some specific reason for thinking this is a zfs issue, you probably want to ask on the crossbow-discuss mailing list. -- Andrew ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss