Re: [zfs-discuss] Help with setting up ZFS
> > > The two plugs that I indicated are multi-lane SAS > ports, which /require/ > using a breakout cable; don't worry - that the > design for them. > "multi-lane" means exactly that - several actual SAS > connections in a > single plug. The other 6 ports next to them (in > black) are SATA ports > connected to the ICH9R. Just a quick question before I address everyone else. I bought this connector http://www.newegg.com/Product/Product.aspx?Item=N82E16812198020 However its pretty clear to me now (after Ive ordered it) that it won't at all fit in the SAS connector on the board. What kind of cable do I need for this? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS Mirror : drive unexpectedly unplugged
Hi ! I'm a Mac User, but I think that I will get more response here about this question than on a Mac forum. And first, sorry for my approximative English. I have a ZFS Pool named "MyPool" with two device (two external USB drive), configured as mirror : NAME STATE READ WRITE CKSUM MyPoolONLINE 0 0 0 mirror ONLINE 0 0 0 /dev/disk2s2 ONLINE 0 0 0 /dev/disk3s2 ONLINE 0 0 0 The drive disk3 is unexpectedly unplugged for some reason (power failure, etc), but the drive is perfectely functional. I have plugged an other driver before replugging the unplugged drive. This new drive take the device name "disk3"... So, if I plug again the unplugged drive, the drive take the device name "disk4" and my ZFS mirror partition the name "disk4s2". Here is my problem. If I scrub my pool, he don't recognize the new device name. It seem that the only solution is to export and import again the pool, because : - If I do a detach of disk3s2 and a attach of disk4s2, he say to me that the drive is not of the good size (it's exact, but I have no problem with this when I create my mirrored pool). - If I do a attach of disk4s2 on disk2s2, it say to me that disk3s2 is busy (it's suspicious : the drive is not used) - If I do a replace, ZFS seem to resilver the totality of the drive (and this can take age to finish !), but the drive doesn't need to be fully resilvered, no (when the drive is unplugged, the drives are in the same stat) ? So what is the official method to say to ZFS that the device name was changed without exporting and re-importing the pool. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Mirror : drive unexpectedly unplugged
I don't have an answer to your question exactly because i'm a noob and i'm not using mac but i can say that on FreeBSD which i'm using atm there is a method to name devices ahead of time so if the drive letters change you avoid this exact problem. I'm sure opensolaris and mac have something similar. As far as your issue goes i THINK you can just export/import the pool.i'm sure someone in this list who knows more than me will speak up on offical stuff =) 2009/7/28 Avérous Julien-Pierre > Hi ! > > I'm a Mac User, but I think that I will get more response here about this > question than on a Mac forum. > And first, sorry for my approximative English. > > I have a ZFS Pool named "MyPool" with two device (two external USB drive), > configured as mirror : > > NAME STATE READ WRITE CKSUM > MyPoolONLINE 0 0 0 > mirror ONLINE 0 0 0 >/dev/disk2s2 ONLINE 0 0 0 >/dev/disk3s2 ONLINE 0 0 0 > > > The drive disk3 is unexpectedly unplugged for some reason (power failure, > etc), but the drive is perfectely functional. > > I have plugged an other driver before replugging the unplugged drive. This > new drive take the device name "disk3"... > So, if I plug again the unplugged drive, the drive take the device name > "disk4" and my ZFS mirror partition the name "disk4s2". > > Here is my problem. If I scrub my pool, he don't recognize the new device > name. It seem that the only solution is to export and import again the pool, > because : > - If I do a detach of disk3s2 and a attach of disk4s2, he say to me that > the drive is not of the good size (it's exact, but I have no problem with > this when I create my mirrored pool). > - If I do a attach of disk4s2 on disk2s2, it say to me that disk3s2 is busy > (it's suspicious : the drive is not used) > - If I do a replace, ZFS seem to resilver the totality of the drive (and > this can take age to finish !), but the drive doesn't need to be fully > resilvered, no (when the drive is unplugged, the drives are in the same > stat) ? > > So what is the official method to say to ZFS that the device name was > changed without exporting and re-importing the pool. > -- > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work
I think people can understand the concept of missing flushes. The big conceptual problem is how this manages to hose an entire filesystem, which is assumed to have rather a lot of data which ZFS has already verified to be ok. Hardware ignoring flushes and loosing recent data is understandable, I don't think anybody would argue with that. Loosing access to your entire pool and multiple gigabytes of data because a few writes failed is a whole different story, and while I understand how it happens, ZFS appears to be unique among modern filesystems in suffering such a catastrophic failure so often. To give a quick personal example: I can plug a fat32 usb disk into a windows system, drag some files to it, and pull that drive at any point. I might loose a few files, but I've never lost the entire filesystem. Even if the absolute worst happened, I know I can run scandisk, chkdisk, or any number of file recovery tools and get my data back. I would never, ever attempt this with ZFS. For a filesystem like ZFS where it's integrity and stability are sold as being way better than existing filesystems, loosing your entire pool is a bit of a shock. I know that work is going on to be able to recover pools, and I'll sleep a lot sounder at night once it is available. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Mirror : drive unexpectedly unplugged
Thank you for you response, wonslung. I can export / import, yes, but for this I should unmount all filesystems depending of the pool, and it's not always possible (and it's sad to be forced to do that). For the same name device, I don't know how to do that. I will search for this. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Mirror : drive unexpectedly unplugged
There is a little mistake : "If I do a attach of disk4s2 on disk2s2, it say to me that disk3s2 is busy (it's suspicious : the drive is not used)" The good version is : "If I do a attach of disk4s2 on disk2s2, it say to me that disk4s2 is busy (it's suspicious : the drive is not used)" (disk3s2 is no more concerned by ZFS in this case) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Mirror : drive unexpectedly unplugged
sometimes the disk will be busy just from being in the directory or if something is trying to connect to it. Again, i'm no expert so i'm going to refrain from commenting on your issue further. 2009/7/28 Avérous Julien-Pierre > There is a little mistake : > > "If I do a attach of disk4s2 on disk2s2, it say to me that disk3s2 is busy > (it's suspicious : the drive is not used)" > > The good version is : > > "If I do a attach of disk4s2 on disk2s2, it say to me that disk4s2 is busy > (it's suspicious : the drive is not used)" > > (disk3s2 is no more concerned by ZFS in this case) > -- > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] How to "mirror" an entire zfs pool to another pool
We are upgrading to new storage hardware. We currently have a zfs pool with the old storage volumes. I would like to create a new zfs pool, completely separate, with the new storage volumes. I do not want to just replace the old volumes with new volumes in the pool we are currently using. I don't see a way to create a mirror of a pool. Note, I'm not talking about a mirrored-pool, meaning mirrored drives inside the pool. I want to mirror pool1 to pool2. Snapshots and clones do not seem to be what I want as they only work inside a given pool. I have looked at Sun Network Data Replicator (SNDR) but that doesn't seem to be what I want either as the physical volumes in the new pool may be a different size than in the old pool. Does anyone know how to do this? My only idea at the moment is to create the new pool, create new filesystems and then use rsync from the old filesystems to the new filesystems, but it seems like there should be a way to mirror or replicate the pool itself rather than doing it at the filesystem level. Thomas Walker -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to "mirror" an entire zfs pool to another pool
Thomas Walker wrote: We are upgrading to new storage hardware. We currently have a zfs pool with the old storage volumes. I would like to create a new zfs pool, completely separate, with the new storage volumes. I do not want to just replace the old volumes with new volumes in the pool we are currently using. I don't see a way to create a mirror of a pool. Note, I'm not talking about a mirrored-pool, meaning mirrored drives inside the pool. I want to mirror pool1 to pool2. Snapshots and clones do not seem to be what I want as they only work inside a given pool. I have looked at Sun Network Data Replicator (SNDR) but that doesn't seem to be what I want either as the physical volumes in the new pool may be a different size than in the old pool. Does anyone know how to do this? My only idea at the moment is to create the new pool, create new filesystems and then use rsync from the old filesystems to the new filesystems, but it seems like there should be a way to mirror or replicate the pool itself rather than doing it at the filesystem level. have you looked at what 'zfs send' can do? Michael -- Michael Schuster http://blogs.sun.com/recursion Recursion, n.: see 'Recursion' ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to "mirror" an entire zfs pool to another pool
Thomas Walker wrote: We are upgrading to new storage hardware. We currently have a zfs pool with the old storage volumes. I would like to create a new zfs pool, completely separate, with the new storage volumes. I do not want to just replace the old volumes with new volumes in the pool we are currently using. I don't see a way to create a mirror of a pool. Note, I'm not talking about a mirrored-pool, meaning mirrored drives inside the pool. I want to mirror pool1 to pool2. Snapshots and clones do not seem to be what I want as they only work inside a given pool. I have looked at Sun Network Data Replicator (SNDR) but that doesn't seem to be what I want either as the physical volumes in the new pool may be a different size than in the old pool. Does anyone know how to do this? My only idea at the moment is to create the new pool, create new filesystems and then use rsync from the old filesystems to the new filesystems, but it seems like there should be a way to mirror or replicate the pool itself rather than doing it at the filesystem level. You can do this by attaching the new disks one by one to the old ones. This is only going to work if your new storage pool has exactly the same number (at the same size or larger) disks. For example you have 12 500G drives and your new storage is 12 1TB drivers. That will work. For each drive in the old pool do: zpool attach When you have done that and the resilver has completed then you can zpool detach all the old drives. If your existing storage is all ready mirrored this still works you just do the detach twice to get off the old pool. On the other hand if you have 12 500G drives and your new storage is 6 1TB drives then you can't do that via mirroring you need to use zfs send and recv eg: zpool create newpool zfs snapshot -r oldp...@sendit zfs send -R oldp...@sendit | zfs recv -vFd newpool That will work providing the data will fit and unlike rsync it will preserve all your snapshots and you don't have to recreate the new filesystems. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to "mirror" an entire zfs pool to another pool
> zpool create newpool > zfs snapshot -r oldp...@sendit > zfs send -R oldp...@sendit | zfs recv -vFd newpool I think this is probably something like what I want, the problem is I'm not really "getting it" yet. If you could explain just what is happening here in an example. Let's say I have this setup; oldpool = 10 x 500GB volumes, with two mounted filesystems; fs1 and fs2 I create newpool = 12 x 1TB volumes using new storage hardware. newpool thus has a lot more capacity than oldpool, but not the same number of physical volumes or the same size volumes. I want to replicate oldpool and thus oldpool/fs1 and oldpool/fs2 on newpool/fs1 and newpool/fs2. And I want to do this in a way that allows me to "switch over" from oldpool to newpool on a day that is scheduled with the customers and then take oldpool away. So on Monday I take a snapshot of oldpool, like you say; zfs snapshot -r oldp...@sendit And I send/recv it to newpool; zfs send -R oldp...@sendit | zfs recv -vFd newpool At this point does all of that data, say 3TB or so, start copying over to the newpool? How do I monitor the progress of the transfer? Once that initial copy is done, on say Wednesday, how do I then do a final "sync" from oldpool to newpool to pick up any changes that occurred since the first snapshot on Monday. I assume that for this final snapshot I would unmount the filesystems to prevent any changes by the customer. Sorry I'm being dense here, I think I sort of get it but I don't have the whole picture. Thomas Walker -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to "mirror" an entire zfs pool to another pool
I think this is probably something like what I want, the problem is I'm not really "getting it" yet. If you could explain just what is happening here in an example. Let's say I have this setup; oldpool = 10 x 500GB volumes, with two mounted filesystems; fs1 and fs2 I create newpool = 12 x 1TB volumes using new storage hardware. newpool thus has a lot more capacity than oldpool, but not the same number of physical volumes or the same size volumes. That is fine because the zfs send | zfs recv copies the data across. I want to replicate oldpool and thus oldpool/fs1 and oldpool/fs2 on newpool/fs1 and newpool/fs2. And I want to do this in a way that allows me to "switch over" from oldpool to newpool on a day that is scheduled with the customers and then take oldpool away. So depending on the volume of data change you might need to do the snapshot and send several times. So on Monday I take a snapshot of oldpool, like you say; zfs snapshot -r oldp...@sendit And I send/recv it to newpool; zfs send -R oldp...@sendit | zfs recv -vFd newpool At this point does all of that data, say 3TB or so, start copying over to the newpool? Everything in all the oldpool datasets that was written upto the time the @sendit snapshot was created will be. > How do I monitor the progress of the transfer? Once Unfortunately there is no easy way to do that just now. When the 'zfs recv' finishes is it is done. that initial copy is done, on say Wednesday, how do I then do a final "sync" from oldpool to newpool to pick up any changes that occurred since the first snapshot on Monday. Do almost the same again eg: zfs snapshot -r oldp...@wednesday zfs send -R -i oldp...@monday oldp...@wednesday | zfs recv -vFd newpool > I assume that for this final snapshot I would unmount the filesystems to prevent any changes by the customer. That is very good idea, the filesystem does *not* need to be mounted for the zfs send to work. Once the last send is finished do: zpool export oldpool If you want to actually rename newpool back to the oldpool name do this: zpool export newpool zpool import newpool oldpool Sorry I'm being dense here, I think I sort of get it but I don't have the whole picture. You are very close, there is some more info in the zfs(1M) man page. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] USF drive on S10u7
What is the best way to attach an USB harddisk to Solaris 10u7? I know some program is running to auto detect such a device (have forgotten the name, because I do almost all work on OSOL (hal). do I use "that program" or disable it an manualy attach the drive to the system? -- Dick Hoogendijk -- PGP/GnuPG key: 01D2433D + http://nagual.nl/ | SunOS 10u7 05/09 | OpenSolaris 2010.02 B118 + All that's really worth doing is what we do for others (Lewis Carrol) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [indiana-discuss] zfs issues?
Thanks for that Brian. I've logged a bug: CR 6865661 *HOT* Created, P1 opensolaris/triage-queue zfs scrub rpool causes zpool hang Just discovered after trying to create a further crash dump that it's failing and rebooting with the following error (just caught it prior to the reboot): panic dump timeout so I'm not sure how else to assist with debugging this issue. cheers, James On 28/07/2009, at 9:08 PM, Brian Ruthven - Solaris Network Sustaining - Sun UK wrote: Yes: $Make sure your dumpadm is set up beforehand to enable savecore, and that you have a dump device. In my case the output looks like this: $ pfexec dumpadm Dump content: kernel pages Dump device: /dev/zvol/dsk/rpool/dump (dedicated) Savecore directory: /var/crash/opensolaris Savecore enabled: yes Then you should get a dump saved in /var/crash/ on next reboot. Brian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to "mirror" an entire zfs pool to another pool
I think you've given me enough information to get started on a test of the procedure. Thanks very much. Thomas Walker -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] When writing to SLOG at full speed all disk IO is blocked
Ok Bob, but i think that is the problem about picket fencing... and so we are talking about commit the sync operations to disk. What i'm seeing is no read activity from disks when the slog is beeing written. The disks are "zero" (no read, no write). Thanks a lot for your reply. Leal [ http://www.eall.com.br/blog ] -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] USF drive on S10u7
Hi Dick, The Solaris 10 volume management service is volfs. If you attach the USB hard disk and run volcheck, the disk should be mounted under the /rmdisk directory. If the auto-mounting doesn't occur, you can disable volfs and mount it manually. You can read more about this feature here: http://docs.sun.com/app/docs/doc/817-5093/medaccess-29267?a=view Cindy On 07/28/09 07:56, dick hoogendijk wrote: What is the best way to attach an USB harddisk to Solaris 10u7? I know some program is running to auto detect such a device (have forgotten the name, because I do almost all work on OSOL (hal). do I use "that program" or disable it an manualy attach the drive to the system? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [sam-qfs-discuss] sam-fs on zfs-pool
On 27/07/2009, at 10:14 PM, Tobias Exner wrote: Hi list, I've did some tests and run into a very strange situation.. I created a zvol using "zfs create -V" and initialize an sam- filesystem on this zvol. After that I restored some testdata using a dump from another system. So far so good. After some big troubles I found out that releasing files in the sam- filesystem doesn't create space on the underlying zvol. So staging and releasing files just work until the "zfs list" shows me a zvol with 100% usage although the sam-filesystem was only filled up to 20%. I didn't create snapshots and a scrub did show any errors. When the zvol was filled up even a sammkfs can't solve the problem. I had to destroy the zvol ( not zpool ). After that I was able recreate a new zvol with sam-fs on top. this is a feature of block devices. once you (or samfs) uses a block on the zvol, it has no mechanism to tell the zvol when it is no longer using it. samfs simply unreferences the blocks it frees, it doesnt actively go through them and tell the block layer underneath it that they can be reclaimed. from the zvols point of view theyre still being used because they were used at some point in the past. you might be able to get the space back in the zvol by writing a massive file full of zeros in the samfs, but you'd have to test that. Is that a known behaviour? .. or did I run into a bug? it's known. dlg System: SAM-FS 4.6.85 Solaris 10 U7 X86 ___ sam-qfs-discuss mailing list sam-qfs-disc...@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/sam-qfs-discuss David Gwynne Infrastructure Architect Engineering, Architecture, and IT University of Queensland +61 7 3365 3636 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [indiana-discuss] zfs issues?
Yes: $Make sure your dumpadm is set up beforehand to enable savecore, and that you have a dump device. In my case the output looks like this: $ pfexec dumpadm Dump content: kernel pages Dump device: /dev/zvol/dsk/rpool/dump (dedicated) Savecore directory: /var/crash/opensolaris Savecore enabled: yes Then you should get a dump saved in /var/crash/ on next reboot. Brian James Lever wrote: On 28/07/2009, at 9:22 AM, Robert Thurlow wrote: I can't help with your ZFS issue, but to get a reasonable crash dump in circumstances like these, you should be able to do "savecore -L" on OpenSolaris. That would be well and good if I could get a login - due to the rpool being unresponsive, that was not possible. So the only recourse we had was via kmdb :/ Is there a way to explicitly invoke savecore via kmdb? James ___ indiana-discuss mailing list indiana-disc...@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/indiana-discuss -- Brian Ruthven Solaris Revenue Product Engineering Sun Microsystems UK Sparc House, Guillemont Park, Camberley, GU17 9QG ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [sam-qfs-discuss] sam-fs on zfs-pool
On 28 July, 2009 - David Gwynne sent me these 1,9K bytes: > > On 27/07/2009, at 10:14 PM, Tobias Exner wrote: > >> Hi list, >> >> I've did some tests and run into a very strange situation.. >> >> >> I created a zvol using "zfs create -V" and initialize an sam- >> filesystem on this zvol. >> After that I restored some testdata using a dump from another system. >> >> So far so good. >> >> After some big troubles I found out that releasing files in the sam- >> filesystem doesn't create space on the underlying zvol. >> So staging and releasing files just work until the "zfs list" shows me >> a zvol with 100% usage although the sam-filesystem was only filled up >> to 20%. >> I didn't create snapshots and a scrub did show any errors. >> >> When the zvol was filled up even a sammkfs can't solve the problem. I >> had to destroy the zvol ( not zpool ). >> After that I was able recreate a new zvol with sam-fs on top. > > this is a feature of block devices. once you (or samfs) uses a block on > the zvol, it has no mechanism to tell the zvol when it is no longer > using it. samfs simply unreferences the blocks it frees, it doesnt > actively go through them and tell the block layer underneath it that > they can be reclaimed. from the zvols point of view theyre still being > used because they were used at some point in the past. http://en.wikipedia.org/wiki/TRIM_(SSD_command) should make it possible I guess.. (assuming it's implemented all the way in the chain).. Should/could help in virtualization too.. /Tomas -- Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] When writing to SLOG at full speed all disk IO is blocked
On Tue, 28 Jul 2009, Marcelo Leal wrote: Ok Bob, but i think that is the problem about picket fencing... and so we are talking about commit the sync operations to disk. What i'm seeing is no read activity from disks when the slog is beeing written. The disks are "zero" (no read, no write). This is an interesting issue. While synchronous writes are requested, what do you expect a read to return? If there is a synchronous write in progress, should readers wait for the write to be persisted in case the write influences the data read? Note that I am not saying that huge synchronous writes should necessary block reading (particularly if the reads are for unrelated blocks/files), but it is understandable if zfs focuses more on the writes. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to "mirror" an entire zfs pool to another pool
Le 28 juil. 09 à 15:54, Darren J Moffat a écrit : > How do I monitor the progress of the transfer? Once Unfortunately there is no easy way to do that just now. When the 'zfs recv' finishes is it is done. I've just found pv (pipe viewer) today (http://www.ivarch.com/programs/pv.shtml ) which is packaged in /contrib (http://pkg.opensolaris.org/contrib/p5i/0/pv.p5i ). You can do zfs send -R oldp...@sendit | pv -s 3T | zfs recv -vFd newpool and you'll see a message like that: 8GO 0:00:05 [5,71GO/s] [=> ] 7% ETA 0:00:58 A nice and simple way to get a progress report! Gaëtan -- Gaëtan Lehmann Biologie du Développement et de la Reproduction INRA de Jouy-en-Josas (France) tel: +33 1 34 65 29 66fax: 01 34 65 29 09 http://voxel.jouy.inra.fr http://www.itk.org http://www.mandriva.org http://www.bepo.fr PGP.sig Description: Ceci est une signature électronique PGP ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool is lain to burnination (bwahahahah!)
Hi Again, A bit more futzing around and I notice that output from a plain 'zdb' returns this: store version=14 name='store' state=0 txg=0 pool_guid=13934602390719084200 hostid=8462299 hostname='store' vdev_tree type='root' id=0 guid=13934602390719084200 bad config type 16 for stats children[0] type='disk' id=0 guid=14931103169794670927 path='/dev/dsk/c0t22310001557D05D5d0s0' devid='id1,s...@x22310001557d05d5/a' phys_path='/scsi_vhci/d...@g22310001557d05d5:a' whole_disk=1 metaslab_array=23 metaslab_shift=35 ashift=9 asize=6486985015296 is_log=0 DTL=44 bad config type 16 for stats So the last line there - 'bad config type 16 for stats' is interesting. The only reference I can find to this error is on and IRC log for some Nexenta folks. Doesn't look like there's much help there. So, uh. Blow away and try again? It seems like that's the way to go here. If anyone has any suggestions let me know! I think I'll start over at 3 PM EST on July 28th. Yes - I did just give you a deadline to recover my data help forum, or I'm blowing it away! Thanks, Graeme -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Help with setting up ZFS
On Tue, Jul 28, 2009 at 03:04, Brian wrote: > Just a quick question before I address everyone else. > I bought this connector > http://www.newegg.com/Product/Product.aspx?Item=N82E16812198020 > > However its pretty clear to me now (after Ive ordered it) that it won't at > all fit in the SAS connector on the board. What kind of cable do I need for > this? Search "8087 forward" on provantage.com. They're about $15, unless you want attached power connectors (which would be necessary for SAS drives, unless some kind of backplane were in play), in which case they're $30. Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [sam-qfs-discuss] sam-fs on zfs-pool
On Jul 28, 2009, at 8:53 AM, Tomas Ögren wrote: On 28 July, 2009 - David Gwynne sent me these 1,9K bytes: On 27/07/2009, at 10:14 PM, Tobias Exner wrote: Hi list, I've did some tests and run into a very strange situation.. I created a zvol using "zfs create -V" and initialize an sam- filesystem on this zvol. After that I restored some testdata using a dump from another system. So far so good. After some big troubles I found out that releasing files in the sam- filesystem doesn't create space on the underlying zvol. So staging and releasing files just work until the "zfs list" shows me a zvol with 100% usage although the sam-filesystem was only filled up to 20%. I didn't create snapshots and a scrub did show any errors. When the zvol was filled up even a sammkfs can't solve the problem. I had to destroy the zvol ( not zpool ). After that I was able recreate a new zvol with sam-fs on top. this is a feature of block devices. once you (or samfs) uses a block on the zvol, it has no mechanism to tell the zvol when it is no longer using it. samfs simply unreferences the blocks it frees, it doesnt actively go through them and tell the block layer underneath it that they can be reclaimed. from the zvols point of view theyre still being used because they were used at some point in the past. http://en.wikipedia.org/wiki/TRIM_(SSD_command) should make it possible I guess.. (assuming it's implemented all the way in the chain).. Should/could help in virtualization too.. Or just enable compression and zero fill. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] USF drive on S10u7
On Tue, 28 Jul 2009 09:03:14 -0600 cindy.swearin...@sun.com wrote: > The Solaris 10 volume management service is volfs. #svcs -a | grep vol has told me that ;-) > If the auto-mounting doesn't occur, you can disable volfs and mount > it manually. I don't want the automounting to occur, so I diabled volfs. I then did a "rmformat" to learn the device name, followed by a "zpool create archive /dev/rdsk/devicename All running nicely. Thanks for the advice. -- Dick Hoogendijk -- PGP/GnuPG key: 01D2433D + http://nagual.nl/ | SunOS 10u7 05/09 | OpenSolaris 2010.02 B118 + All that's really worth doing is what we do for others (Lewis Carrol) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] When writing to SLOG at full speed all disk IO is blocked
My understanding is that there's never any need for a reader to wait for a write in progress. ZFS keeps all writes in memory until they're committed to disk - if you ever try to read something that's either waiting to be, or is being written to disk, ZFS will serve it straight from RAM. One question I do have after reading this again though is: Leal, do you have the slog on the same controller as the disks? Have you tested whether reads are also blocked if you're running on a separate controller? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs destroy slow?
On Mon, Jul 27, 2009 at 3:58 AM, Markus Kovero wrote: > Oh well, whole system seems to be deadlocked. > > nice. Little too keen keeping data safe :-P > > > > Yours > > Markus Kovero > > > > From: zfs-discuss-boun...@opensolaris.org > [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Markus Kovero > Sent: 27. heinäkuuta 2009 13:39 > To: zfs-discuss@opensolaris.org > Subject: [zfs-discuss] zfs destroy slow? > > > > Hi, how come zfs destroy being so slow, eg. destroying 6TB dataset renders > zfs admin commands useless for time being, in this case for hours? > > (running osol 111b with latest patches.) > > > > Yours > > Markus Kovero > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > I submitted a bug, but I don't think its been assigned a case number yet. I see this exact same behavior on my X4540's. I create a lot of snapshots, and when I tidy up, zfs destroy can 'stall' any and all ZFS related commands for hours, or even days (in the case of nested snapshots). The only resolution is not to ever use zfs destroy, or just simply wait it out. It will eventually finish, just not in any reasonable timeframe. -- Brent Jones br...@servuhome.net ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs destroy slow?
>> >> > > I submitted a bug, but I don't think its been assigned a case number yet. > I see this exact same behavior on my X4540's. I create a lot of > snapshots, and when I tidy up, zfs destroy can 'stall' any and all ZFS > related commands for hours, or even days (in the case of nested > snapshots). > The only resolution is not to ever use zfs destroy, or just simply > wait it out. It will eventually finish, just not in any reasonable > timeframe. > > -- > Brent Jones > br...@servuhome.net > Correction, looks like my bug is 6855208 -- Brent Jones br...@servuhome.net ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] USF drive on S10u7
On Tue, 28 Jul 2009, dick hoogendijk wrote: I don't want the automounting to occur, so I diabled volfs. I then did a "rmformat" to learn the device name, followed by a "zpool create archive /dev/rdsk/devicename It is better to edit /etc/vold.conf since vold is used for other purposes as well such as auto-mounting CDs, DVDs, and floppies. I commented out this line: #use rmdisk drive /dev/rdsk/c*s2 dev_rmdisk.so rmdisk%d and then all was good. With a bit more care, some removable devices could be handled differently than others so you could just exclude the ones used for zfs. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs destroy slow?
On 07/27/09 03:39, Markus Kovero wrote: Hi, how come zfs destroy being so slow, eg. destroying 6TB dataset renders zfs admin commands useless for time being, in this case for hours? (running osol 111b with latest patches.) I'm not sure what "latest patches" means w.r.t. ON build, but this is almost certainly: 6809683 zfs destroy fails to free object in open context, stops up txg train Fixed in snv_114. With the above fix in place, destroys can still take awhile (fundamentally you have to I/O proportional to the amount of metadata), but it will be doing all the work in open context which won't stop the txg train and admin commands should continue to work. - Eric -- Eric Schrock, Fishworkshttp://blogs.sun.com/eschrock ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool is lain to burnination (bwahahahah!)
On 28.07.09 20:31, Graeme Clark wrote: Hi Again, A bit more futzing around and I notice that output from a plain 'zdb' returns this: store version=14 name='store' state=0 txg=0 pool_guid=13934602390719084200 hostid=8462299 hostname='store' vdev_tree type='root' id=0 guid=13934602390719084200 bad config type 16 for stats children[0] type='disk' id=0 guid=14931103169794670927 path='/dev/dsk/c0t22310001557D05D5d0s0' devid='id1,s...@x22310001557d05d5/a' phys_path='/scsi_vhci/d...@g22310001557d05d5:a' whole_disk=1 metaslab_array=23 metaslab_shift=35 ashift=9 asize=6486985015296 is_log=0 DTL=44 bad config type 16 for stats This is a dump of /etc/zfs/zpool.cache. While 'stats' should not be there, it does not matter much. So the last line there - 'bad config type 16 for stats' is interesting. The only reference I can find to this error is on and IRC log for some Nexenta folks. Doesn't look like there's much help there. So, uh. Blow away and try again? It seems like that's the way to go here. If anyone has any suggestions let me know! I think I'll start over at 3 PM EST on July 28th. Yes - I did just give you a deadline to recover my data help forum, or I'm blowing it away! It would be helpful if you provide a little bit more information here: what OpenSolaris release/build are you running (i suspect something like build 114-118, though I may be wrong), what other commands you tried (zpool impor/export etc) and what was a result. You can also explain what do you mean here: I can force export and import the pool, but I can't seem to get it active again. as pool status provided before suggests that pool cannot be imported. I can do a zdb to the device and I get some info (well, actually to s0 on the disk, which is weird because I think I built the array without specifying a slice. Maybe relevant - don't know...) When you specify disk without s0 in the end during pool creation you tell ZFS to use the whole disk, so it labels it with EFI label, creates single slice 0 all over the disk and uses that slice for pool as recorded in the configuration (see 'path' and 'whole_disk' name-value pairs). victor ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Set New File/Folder ZFS ACLs Automatically through Samba?
Do any of you know how to set the default ZFS ACLs for newly created files and folders when those files and folders are created through Samba? I want to have all new files and folders only inherit extended (non-trivial) ACLs that are set on the parent folders. But when a file is created through samba on the zfs file system, it gets mode 744 (trivial) added to it. For directories, it gets mode 755 added to it. I've tried everything I could find and think of: 1.) Setting a umask. 2.) Editing /etc/sfw/smb.conf 'force create mode' and 'force directory mode". Then `svcadm restart samba`. 3.) Adding trivial inheritable ACLs to the parent folder. Changes 1 and 2 had no effect. In number 3 I got folders to effectively do what I want, but not files. I set the ACLs of the parent to: > drwx--+ 24 AD+administrator AD+records2132 Jul 28 12:01 records/ > user:AD+administrator:rwxpdDaARWcCos:fdi---:allow > user:AD+administrator:rwxpdDaARWcCos:--:allow > group:AD+records:rwxpd-aARWc--s:fdi---:allow > group:AD+records:rwxpd-aARWc--s:--:allow > group:AD+release:r-x---a-R-c---:--:allow > owner@:rwxp---A-W-Co-:fd:allow > group@:rwxp--:fd:deny > everyone@:rwxp---A-W-Co-:fd:deny Then new directories and files get created like this from a windows workstation connected to the server: > drwx--+ 2 AD+testuser AD+domain users 2 Jul 28 12:01 test > user:AD+administrator:rwxpdDaARWcCos:fdi---:allow > user:AD+administrator:rwxpdDaARWcCos:--:allow > group:AD+records:rwxpd-aARWc--s:fdi---:allow > group:AD+records:rwxpd-aARWc--s:--:allow > owner@:rwxp---A-W-Co-:fdi---:allow > owner@:---A-W-Co-:--:allow > group@:rwxp--:fdi---:deny > group@:--:--:deny > everyone@:rwxp---A-W-Co-:fdi---:deny > everyone@:---A-W-Co-:--:deny > owner@:--:--:deny > owner@:rwxp---A-W-Co-:--:allow > group@:-w-p--:--:deny > group@:r-x---:--:allow > everyone@:-w-p---A-W-Co-:--:deny > everyone@:r-x---a-R-c--s:--:allow > -rwxr--r--+ 1 AD+testuser AD+domain users 0 Jul 28 12:01 test.txt > user:AD+administrator:rwxpdDaARWcCos:--:allow > group:AD+records:rwxpd-aARWc--s:--:allow > owner@:---A-W-Co-:--:allow > group@:--:--:deny > everyone@:---A-W-Co-:--:deny > owner@:--:--:deny > owner@:rwxp---A-W-Co-:--:allow > group@:-wxp--:--:deny > group@:r-:--:allow > everyone@:-wxp---A-W-Co-:--:deny > everyone@:r-a-R-c--s:--:allow I need group "AD+release" to have read-only access to only specific files within records. I could set that up, but any new files or folders that are created will be viewable by AD+release. That would not be acceptable. Do any of you know how to set the samba file/folder creation ACLS on ZFS file systems? Or do you have something I could try? Thank you for your time. -- Jeff Hulen ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
On Mon, Jul 20, 2009 at 7:52 PM, Bob Friesenhahn wrote: Sun has opened internal CR 6859997. It is now in Dispatched state at High priority. CR 6859997 has been accepted and is actively being worked on. The following info has been added to that CR: This is a problem with the ZFS file prefetch code (zfetch) in dmu_zfetch.c. The test script provided by the submitter (thanks Bob!) does no file prefetching the second time through each file. This problem exists in ZFS in Solaris 10, Nevada, and OpenSolaris. This test script creates 3000 files each 8M long so the amount of data (24G) is greater than the amount of memory (16G on a Thumper). With the default blocksize of 128k, each of the 3000 files has 63 blocks. The first time through, zfetch ramps up a single prefetch stream normally. But the second time through, dmu_zfetch() calls dmu_zfetch_find() which thinks that the data has already been prefetched so no additional prefetching is started. This problem is not seen with 500 files each 48M in length (still 24G of data). In that case there's still only one prefetch stream but it is reclaimed when one of the requested offsets is not found. The reason it is not found is that stream "strided" the first time through after reaching the zfetch cap, which is 256 blocks. Files with no more than 256 blocks don't require a stride. So this problem will only be seen when the data from a file with no more than 256 blocks is accessed after being tossed from the ARC. The fix for this problem may be more feedback between the ARC and the zfetch code. Or it may make sense to restart the prefetch stream after some time has passed or perhaps whenever there's a miss on a block that was expected to have already been prefetched? On a Thumper running Nevada build 118, the first pass of this test takes 2 minutes 50 seconds and the second pass takes 5 minutes 22 seconds. If dmu_zfetch_find() is modified to restart the refetch stream when the requested offset is 0 and more than 2 seconds has passed since the stream was last accessed then the time needed for the second pass is reduced to 2 minutes 24 seconds. Additional investigation is currently taking place to determine if another solution makes more sense. And more testing will be needed to see what affect this change has on other prefetch patterns. 6412053 is a related CR which mentions that the zfetch code may not be issuing I/O at a sufficient pace. This behavior is also seen on a Thumper running the test script in CR 6859997 since, even when prefetch is ramping up as expected, less than half of the available I/O bandwidth is being used. Although more aggressive file prefetching could increase memory pressure as described in CRs 6258102 and 6469558. -- Rich ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs send/recv syntax
Is it possible to send an entire pool (including all its zfs filesystems) to a zfs filesystem in a different pool on another host? Or must I send each zfs filesystem one at a time? Thanks! jlc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] avail drops to 32.1T from 40.8T after create -o mountpoint
This is my first ZFS pool. I'm using an X4500 with 48 TB drives. Solaris is 5/09. After the create zfs list shows 40.8T but after creating 4 filesystems/mountpoints the available drops 8.8TB to 32.1TB. What happened to the 8.8TB. Is this much overhead normal? zpool create -f zpool1 raidz c1t0d0 c2t0d0 c3t0d0 c5t0d0 c6t0d0 \ raidz c1t1d0 c2t1d0 c3t1d0 c4t1d0 c5t1d0 \ raidz c6t1d0 c1t2d0 c2t2d0 c3t2d0 c4t2d0 \ raidz c5t2d0 c6t2d0 c1t3d0 c2t3d0 c3t3d0 \ raidz c4t3d0 c5t3d0 c6t3d0 c1t4d0 c2t4d0 \ raidz c3t4d0 c5t4d0 c6t4d0 c1t5d0 c2t5d0 \ raidz c3t5d0 c4t5d0 c5t5d0 c6t5d0 c1t6d0 \ raidz c2t6d0 c3t6d0 c4t6d0 c5t6d0 c6t6d0 \ raidz c1t7d0 c2t7d0 c3t7d0 c4t7d0 c5t7d0 \ spare c6t7d0 c4t0d0 c4t4d0 zpool list NAME SIZE USED AVAILCAP HEALTH ALTROOT zpool1 40.8T 176K [b]40.8T[/b] 0% ONLINE - ## create multiple file systems in the pool zfs create -o mountpoint=/backup1fs zpool1/backup1fs zfs create -o mountpoint=/backup2fs zpool1/backup2fs zfs create -o mountpoint=/backup3fs zpool1/backup3fs zfs create -o mountpoint=/backup4fs zpool1/backup4fs zfs list NAME USED AVAIL REFER MOUNTPOINT zpool1 364K [b]32.1T[/b] 28.8K /zpool1 zpool1/backup1fs 28.8K 32.1T 28.8K /backup1fs zpool1/backup2fs 28.8K 32.1T 28.8K /backup2fs zpool1/backup3fs 28.8K 32.1T 28.8K /backup3fs zpool1/backup4fs 28.8K 32.1T 28.8K /backup4fs Thanks, Glen (PS. As I said this is my first time working with ZFS, if this is a dumb question - just say so.) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] avail drops to 32.1T from 40.8T after create -o mountpoint
This is my first ZFS pool. I'm using an X4500 with 48 TB drives. Solaris is 5/09. After the create zfs list shows 40.8T but after creating 4 filesystems/mountpoints the available drops 8.8TB to 32.1TB. What happened to the 8.8TB. Is this much overhead normal? IIRC zpool list includes the parity drives in the disk space calculation and zfs list doesn't. Terabyte drives are more likely 900-something GB drives thanks to that base-2 vs. base-10 confusion HD manufacturers introduced. Using that 900GB figure I get to both 40TB and 32TB for with and without parity drives. Spares aren't counted. Regards, -mg ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/recv syntax
On Wed 29/07/09 10:09 , "Joseph L. Casale" jcas...@activenetwerx.com sent: > Is it possible to send an entire pool (including all its zfsfilesystems) > to a zfs filesystem in a different pool on another host? Or must I send each > zfs filesystem one at a time? Yes, use -R on the sending side and -d on the receiving side. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] avail drops to 32.1T from 40.8T after create -o mountpoint
Glen Gunselman wrote: This is my first ZFS pool. I'm using an X4500 with 48 TB drives. Solaris is 5/09. After the create zfs list shows 40.8T but after creating 4 filesystems/mountpoints the available drops 8.8TB to 32.1TB. What happened to the 8.8TB. Is this much overhead normal? zpool create -f zpool1 raidz c1t0d0 c2t0d0 c3t0d0 c5t0d0 c6t0d0 \ raidz c1t1d0 c2t1d0 c3t1d0 c4t1d0 c5t1d0 \ raidz c6t1d0 c1t2d0 c2t2d0 c3t2d0 c4t2d0 \ raidz c5t2d0 c6t2d0 c1t3d0 c2t3d0 c3t3d0 \ raidz c4t3d0 c5t3d0 c6t3d0 c1t4d0 c2t4d0 \ raidz c3t4d0 c5t4d0 c6t4d0 c1t5d0 c2t5d0 \ raidz c3t5d0 c4t5d0 c5t5d0 c6t5d0 c1t6d0 \ raidz c2t6d0 c3t6d0 c4t6d0 c5t6d0 c6t6d0 \ raidz c1t7d0 c2t7d0 c3t7d0 c4t7d0 c5t7d0 \ spare c6t7d0 c4t0d0 c4t4d0 zpool list NAME SIZE USED AVAILCAP HEALTH ALTROOT zpool1 40.8T 176K [b]40.8T[/b] 0% ONLINE - ## create multiple file systems in the pool zfs create -o mountpoint=/backup1fs zpool1/backup1fs zfs create -o mountpoint=/backup2fs zpool1/backup2fs zfs create -o mountpoint=/backup3fs zpool1/backup3fs zfs create -o mountpoint=/backup4fs zpool1/backup4fs zfs list NAME USED AVAIL REFER MOUNTPOINT zpool1 364K [b]32.1T[/b] 28.8K /zpool1 zpool1/backup1fs 28.8K 32.1T 28.8K /backup1fs zpool1/backup2fs 28.8K 32.1T 28.8K /backup2fs zpool1/backup3fs 28.8K 32.1T 28.8K /backup3fs zpool1/backup4fs 28.8K 32.1T 28.8K /backup4fs Thanks, Glen (PS. As I said this is my first time working with ZFS, if this is a dumb question - just say so.) Here is the output from my J4500 with 48 x 1 TB disks. It is almost the exact same configuration as yours. This is used for Netbackup. As Mario just pointed out, "zpool list" includes the parity drive in the space calculation whereas "zfs list" doesn't. [r...@xxx /]#> zpool status errors: No known data errors pool: nbupool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM nbupool ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 c2t3d0 ONLINE 0 0 0 c2t4d0 ONLINE 0 0 0 c2t5d0 ONLINE 0 0 0 c2t6d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c2t7d0 ONLINE 0 0 0 c2t8d0 ONLINE 0 0 0 c2t9d0 ONLINE 0 0 0 c2t10d0 ONLINE 0 0 0 c2t11d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c2t12d0 ONLINE 0 0 0 c2t13d0 ONLINE 0 0 0 c2t14d0 ONLINE 0 0 0 c2t15d0 ONLINE 0 0 0 c2t16d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c2t17d0 ONLINE 0 0 0 c2t18d0 ONLINE 0 0 0 c2t19d0 ONLINE 0 0 0 c2t20d0 ONLINE 0 0 0 c2t21d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c2t22d0 ONLINE 0 0 0 c2t23d0 ONLINE 0 0 0 c2t24d0 ONLINE 0 0 0 c2t25d0 ONLINE 0 0 0 c2t26d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c2t27d0 ONLINE 0 0 0 c2t28d0 ONLINE 0 0 0 c2t29d0 ONLINE 0 0 0 c2t30d0 ONLINE 0 0 0 c2t31d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c2t32d0 ONLINE 0 0 0 c2t33d0 ONLINE 0 0 0 c2t34d0 ONLINE 0 0 0 c2t35d0 ONLINE 0 0 0 c2t36d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c2t37d0 ONLINE 0 0 0 c2t38d0 ONLINE 0 0 0 c2t39d0 ONLINE 0 0 0 c2t40d0 ONLINE 0 0 0 c2t41d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c2t42d0 ONLINE 0 0 0 c2t43d0 ONLINE 0 0 0 c2t44d0 ONLINE 0 0 0 c2t45d0 ONLINE 0 0 0 c2t46d0 ONLINE 0 0 0 spares c2t47d0 AVAIL c2t48d0 AVAIL c2t49d0 AVAIL errors: No known data errors [r...@xxx /]#> zfs list NAME USED AVAIL REFER MOUNTPOINT NBU 113G 20.6G 113G /NBU nbupool 27.5T 4.58T 30.4K /nbupool nbupool/backup1 6.90T 4.58T 6.90T /backup1 nbupool/backup2 6.79T 4.58T 6.79T /backup2 nbupool/backup3 7.28T 4.58T 7.28T /backup3 nbupool/backup4 6.43T 4.58T 6.43T /backup4 nbupool/nbushareddisk 20.1G 4.58T 20.1G /nbushareddisk nbupool/zfscachetest 69.2G 4.58T 69.2G /nbupool/zfscachetest [r...@xxx /]#> zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT NBU 136G 113G 22.8G 83% ONLINE - nbupool 40.8T 34.4T 6.37T 84% ONLINE - [r...@solnbu1 /]#> -- ___ Scott Lawson Systems Architect Manukau Institute of Technology Information Communication Technology Services Private Bag 94006 Manukau City Auckland New Zealand Phone : +64 09 968 7611 Fax: +64 09 968 7641 Mobile : +64 27 568 7611 mailto:sc...@manukau.ac.nz http://www.manukau.ac.nz perl -e 'print $i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);' ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/recv syntax
>Yes, use -R on the sending side and -d on the receiving side. I tried that first, going from Solaris 10 to osol 0906: # zfs send -vR mypo...@snap |ssh j...@catania "pfexec /usr/sbin/zfs recv -dF mypool/somename" didn't create any of the zfs filesystems under mypool2? Thanks! jlc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
On Tue, 28 Jul 2009, Rich Morris wrote: 6412053 is a related CR which mentions that the zfetch code may not be issuing I/O at a sufficient pace. This behavior is also seen on a Thumper running the test script in CR 6859997 since, even when prefetch is ramping up as expected, less than half of the available I/O bandwidth is being used. Although more aggressive file prefetching could increase memory pressure as described in CRs 6258102 and 6469558. It is good to see this analysis. Certainly the optimum prefetching required for an Internet video streaming server (with maybe 300 kilobits/second per stream) is radically different than what is required for uncompressed 2K preview (8MB/frame) of motion picture frames (320 megabytes/second per stream) but zfs should be able to support both. Besides real-time analysis based on current stream behavior and memory, it would be useful to maintain some recent history for the whole pool so that a pool which is usually used for 1000 slow-speed video streams behaves differently by default than one used for one or two high-speed video streams. With this bit of hint information, files belonging to a pool recently producing high-speed streams can be ramped up quickly while files belonging to a pool which has recently fed low-speed streams can be ramped up more conservatively (until proven otherwise) in order to not flood memory and starve the I/O needed by other streams. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/recv syntax
Try send/receive to the same host (ssh localhost). I used this when trying send/receive as it removes ssh between hosts "problems" The on disk format of ZFS has changed there is something about it in the man pages from memory so I don't think you can go S10 -> OpenSolaris without doing an upgrade, but I could be wrong! Joseph L. Casale wrote: Yes, use -R on the sending side and -d on the receiving side. I tried that first, going from Solaris 10 to osol 0906: # zfs send -vR mypo...@snap |ssh j...@catania "pfexec /usr/sbin/zfs recv -dF mypool/somename" didn't create any of the zfs filesystems under mypool2? Thanks! jlc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss www.eagle.co.nz This email is confidential and may be legally privileged. If received in error please destroy and immediately notify us. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
On Tue, 28 Jul 2009, Rich Morris wrote: The fix for this problem may be more feedback between the ARC and the zfetch code. Or it may make sense to restart the prefetch stream after some time has passed or perhaps whenever there's a miss on a block that was expected to have already been prefetched? Regarding this approach of waiting for a prefetch miss, this seems like it would produce an uneven flow of data to the application and not ensure that data is always available when the application goes to read it. A stutter is likely to produce at least a 10ms gap (and possibly far greater) while the application is blocked in read() waiting for data. Since zfs blocks are large, stuttering becomes expensive, and if the application itself needs to read ahead 128K in order to avoid the stutter, then it consumes memory in an expensive non-sharable way. In the ideal case, zfs will always stay one 128K block ahead of the application's requirement and the unconsumed data will be cached in the ARC where it can be shared with other processes. For an application with real-time data requirements, it is definitely desireable not to stutter at all if possible. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work
> > Can *someone* please name a single drive+firmware or > RAID > controller+firmware that ignores FLUSH CACHE / FLUSH > CACHE EXT > commands? Or worse, responds "ok" when the flush > hasn't occurred? I think it would be a shorter list if one were to name the drives/controllers that actually implement a flush properly. > Everyone on this list seems to blame lying hardware > for ignoring > commands, but disks are relatively mature and I can't > believe that > major OEMs would qualify disks or other hardware that > willingly ignore > commands. It seems you have too much faith in major OEM's of storage, considering that 99.9% of the market is personal use, and for which a 2% throughput advantage over a competitor can make or break the profit margin on a device. Ignoring cache requests is guaranteed to get the best drive performance benchmarks regardless of what the software is driving the device. For example, it is virtually impossible to find a USB drive that honors cache sync (to do so would require that the device would stop completely until a fully synchronous USB transaction had made it to the device, the data had been written). Can you imagine how long a USB drive would sit on store shelves if it actually did do a proper cache sync? While USB is the extreme case; and it does get better the more expensive the drive, it is still far from a given that any particular device properly handles cache flushes. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/recv syntax
On Wed 29/07/09 10:49 , "Joseph L. Casale" jcas...@activenetwerx.com sent: > >Yes, use -R on the sending side and -d on the receiving side. > I tried that first, going from Solaris 10 to osol 0906: > > # zfs send -vR mypo...@snap|ssh j...@catania "pfexec /usr/sbin/zfs recv -dF > mypool/somename" > didn't create any of the zfs filesystems under mypool2? What happens if you try it on the local host where you can just pipe from the send to the receive (no need for ssh)? zfs send -R mypo...@snap | zfs recv -d -n -v newpool/somename Another thing to try is use "-n -v" on the receive end to see what would be created id -n were omitted. I find -v more useful on the receiving side than on the send. -- Ian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work
> This is also (theoretically) why a drive purchased > from Sun is more > that expensive then a drive purchased from your > neighbourhood computer > shop: It's more significant than that. Drives aimed at the consumer market are at a competitive disadvantage if they do handle cache flush correctly (since the popular hardware blog of the day will show that the device is far slower than the competitors that throw away the sync requests). Sun (and presumably other manufacturers) takes > the time and > effort to test things to make sure that when a drive > says "I've synced > the data", it actually has synced the data. This > testing is what > you're presumably paying for. It wouldn't cost any more for commercial vendors to implement cache flush properly, it is just that they are penalized by the market for doing so. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/recv syntax
I apologize for replying in the middle of this thread, but I never saw the initial snapshot syntax of mypool2, which needs to be recursive (zfs snapshot -r mypo...@snap) to snapshot all the datasets in mypool2. Then, use zfs send -R to pick up and restore all the dataset properties. What was the original snapshot syntax? Cindy - Original Message - From: Ian Collins Date: Tuesday, July 28, 2009 5:53 pm Subject: Re: [zfs-discuss] zfs send/recv syntax To: "zfs-discuss@opensolaris.org" , "Joseph L. Casale" > On Wed 29/07/09 10:49 , "Joseph L. Casale" jcas...@activenetwerx.com > sent: > > > >Yes, use -R on the sending side and -d on the receiving side. > > > I tried that first, going from Solaris 10 to osol 0906: > > > > # zfs send -vR mypo...@snap|ssh j...@catania "pfexec /usr/sbin/zfs > recv -dF mypool/somename" > > didn't create any of the zfs filesystems under mypool2? > > What happens if you try it on the local host where you can just pipe > from the send to the receive (no need for ssh)? > > zfs send -R mypo...@snap | zfs recv -d -n -v newpool/somename > > Another thing to try is use "-n -v" on the receive end to see what > would be created id -n were omitted. > > I find -v more useful on the receiving side than on the send. > > -- > Ian > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
This thread started over in nfs-discuss, as it appeared to be an nfs problem initially. Or at the very least, interaction between nfs and zil. Just summarising speeds we have found when untarring something. Always in a new/empty directory. Only looking at write speed. read is always very fast. The reason we started to look at this was because the 7 year old netapp being phased out, could untar the test file in 11 seconds. The x4500/x4540 Suns took 5 minutes. For all our tests, we used MTOS-4.261-ja.tar.gz, just a random tarball I had lying around, but it can be downloaded here if you want the same test. (http://www.movabletype.org/downloads/stable/MTOS-4.261-ja.tar.gz) The command executed generally, is: # mkdir .test34 && time gtar --directory=.test34 -zxf /tmp/MTOS-4.261-ja.tar.gz Solaris 10 1/06 intel client: netapp 6.5.1 FAS960 server: NFSv3 0m11.114s Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 server: nfsv4 5m11.654s Solaris 10 6/06 intel client: x4500 Solaris 10 10/08 server: nfsv3 8m55.911s Solaris 10 6/06 intel client: x4500 Solaris 10 10/08 server: nfsv4 10m32.629s Just untarring the tarball on the x4500 itself: : x4500 OpenSolaris svn117 server 0m0.478s : x4500 Solaris 10 10/08 server 0m1.361s So ZFS itself is very fast. Replacing NFS with different protocols, identical setup, just changing tar with rsync, and nfsd with sshd. The baseline test, using: "rsync -are ssh /tmp/MTOS-4.261-ja /export/x4500/testXX" Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 : rsync on nfsv4 3m44.857s Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 : rsync+ssh 0m1.387s So, get rid of nfsd and it goes from 3 minutes to 1 second! Lets share it with smb, and mount it: OsX 10.5.6 intel client: x4500 OpenSolaris svn117 : smb+untar 0m24.480s Neat, even SMB can beat nfs in default settings. This would then indicate to me that nfsd is broken somehow, but then we try again after only disabling ZIL. Solaris 10 6/06 : x4500 OpenSolaris svn117 DISABLE ZIL: nfsv4 0m8.453s 0m8.284s 0m8.264s Nice, so this is theoretically the fastest NFS speeds we can reach? We run postfix+dovecot for mail, which probably would be safe to not use ZIL. The other type is FTP/WWW/CGI, which has more active writes/updates. Probably not as good. Comments? Enable ZIL, but disable zfscache (Just as a test, I have been told disabling zfscache is far more dangerous). Solaris 10 6/06 : x4500 OpenSolaris svn117 DISABLE zfscacheflush: nfsv4 0m45.139s Interesting. Anyway, enable ZIL and zfscacheflush again, and learn a whole lot about slog. First I tried creating a 2G slog on the boot mirror: Solaris 10 6/06 : x4500 OpenSolaris svn117 slog boot pool: nfsv4 1m59.970s Some improvements. For a lark, I created a 2GB file in /tmp/ and changed the slog to that. (I know, having the slog in volatile RAM is pretty much the same as disabling ZIL. But it should give me theoretical maximum speed with ZIL enabled right?). Solaris 10 6/06 : x4500 OpenSolaris svn117 slog /tmp/junk: nfsv4 0m8.916s Nice! Same speed as ZIL disabled. Since this is a X4540, we thought we would test with a CF card attached. Alas the 600X (92MB/s) card are not out until next month, rats! So, we bought a 300X (40MB/s) card. Solaris 10 6/06 : x4500 OpenSolaris svn117 slog 300X CFFlash: nfsv4 0m26.566s Not too bad really. But you have to reboot to see a CF card, fiddle with BIOS for the boot order etc. Just not an easy add on a live system. A SATA emulated SSD DISK can be hot-swapped. Also, I learned an interesting lesson about rebooting with slog at /tmp/junk. I am hoping to pick up a SSD SATA device today and see what speeds we get out of that. That rsync (1s) vs nfs(8s) I can accept as over-head on a much more complicated protocol, but why would it take 3 minutes to write the same data on the same pool, with rsync(1s) vs nfs(3m)? The ZIL was on, slog is default, but both writing the same way. Does nfsd add FD_SYNC to every close regardless as to whether the application did or not? This I have not yet wrapped my head around. For example, I know rsync and tar does not use fdsync (but dovecot does) on its close(), but does NFS make it fdsync anyway? Sorry for the giant email. -- Jorgen Lundman | Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
On Wed, 29 Jul 2009, Jorgen Lundman wrote: For example, I know rsync and tar does not use fdsync (but dovecot does) on its close(), but does NFS make it fdsync anyway? NFS is required to do synchronous writes. This is what allows NFS clients to recover seamlessly if the server spontaneously reboots. If the NFS client supports it, it can send substantial data (multiple writes) to the server, and then commit it all via an NFS commit. Note that this requires more work by the client since the NFS client is required to replay the uncommited writes if the server goes away. Sorry for the giant email. No, thank you very much for the interesting measurements and data. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work
On Mon, Jul 27 at 13:50, Richard Elling wrote: On Jul 27, 2009, at 10:27 AM, Eric D. Mudama wrote: Can *someone* please name a single drive+firmware or RAID controller+firmware that ignores FLUSH CACHE / FLUSH CACHE EXT commands? Or worse, responds "ok" when the flush hasn't occurred? two seconds with google shows http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=183771&NewLang=en&Hilite=cache+flush Give it up. These things happen. Not much you can do about it, other than design around it. -- richard That example is a windows-specific, and is a software driver, where the data integrity feature must be manually disabled by the end user. The default behavior was always maximum data protection. While perhaps analagous at some level, the perpetual "your hardware must be crappy/cheap/not-as-expensive-as-mine" doesn't seem to be a sufficient explanation when things go wrong, like complete loss of a pool. -- Eric D. Mudama edmud...@mail.bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Failing device in a replicated configuration....on a non replicated pool??
I was greeted by this today. The Sun Message ID page says this should happen when there were errors in a replicated configuration. Clearly there's only one drive here. If there are unrecoverable errors how can my applications not be affected since there's no mirror or parity to recover from? # zpool status rpool pool: rpool state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed after 0h3m with 0 errors on Wed Jul 29 18:52:20 2009 config: NAMESTATE READ WRITE CKSUM rpool ONLINE 0 0 0 c9d0s0ONLINE 6 0 2 errors: No known data errors -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Failing device in a replicated configuration....on a non replicated pool??
On Tue, 28 Jul 2009, fyleow wrote: I was greeted by this today. The Sun Message ID page says this should happen when there were errors in a replicated configuration. Clearly there's only one drive here. If there are unrecoverable errors how can my applications not be affected since there's no mirror or parity to recover from? Metadata is stored redundantly so some metadata could become corrupted and recovered via the redundant copy. In recent zfs you can set copies=2 to store user data redundantly on one drive. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
We just picked up the fastest SSD we could in the local biccamera, which turned out to be a CSSDーSM32NI, with supposedly 95MB/s write speed. I put it in place, and replaced the slog over: 0m49.173s 0m48.809s So, it is slower than the CF test. This is disappointing. Everyone else seems to use Intel X25-M, which have a write-speed of 170MB/s (2nd generation) so perhaps that is why it works better for them. It is curious that it is slower than the CF card. Perhaps because it shares with so many other SATA devices? Oh and we'll probably have to get a 3.5" frame for it, as I doubt it'll stay standing after the next earthquake. :) Lund Jorgen Lundman wrote: This thread started over in nfs-discuss, as it appeared to be an nfs problem initially. Or at the very least, interaction between nfs and zil. Just summarising speeds we have found when untarring something. Always in a new/empty directory. Only looking at write speed. read is always very fast. The reason we started to look at this was because the 7 year old netapp being phased out, could untar the test file in 11 seconds. The x4500/x4540 Suns took 5 minutes. For all our tests, we used MTOS-4.261-ja.tar.gz, just a random tarball I had lying around, but it can be downloaded here if you want the same test. (http://www.movabletype.org/downloads/stable/MTOS-4.261-ja.tar.gz) The command executed generally, is: # mkdir .test34 && time gtar --directory=.test34 -zxf /tmp/MTOS-4.261-ja.tar.gz Solaris 10 1/06 intel client: netapp 6.5.1 FAS960 server: NFSv3 0m11.114s Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 server: nfsv4 5m11.654s Solaris 10 6/06 intel client: x4500 Solaris 10 10/08 server: nfsv3 8m55.911s Solaris 10 6/06 intel client: x4500 Solaris 10 10/08 server: nfsv4 10m32.629s Just untarring the tarball on the x4500 itself: : x4500 OpenSolaris svn117 server 0m0.478s : x4500 Solaris 10 10/08 server 0m1.361s So ZFS itself is very fast. Replacing NFS with different protocols, identical setup, just changing tar with rsync, and nfsd with sshd. The baseline test, using: "rsync -are ssh /tmp/MTOS-4.261-ja /export/x4500/testXX" Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 : rsync on nfsv4 3m44.857s Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 : rsync+ssh 0m1.387s So, get rid of nfsd and it goes from 3 minutes to 1 second! Lets share it with smb, and mount it: OsX 10.5.6 intel client: x4500 OpenSolaris svn117 : smb+untar 0m24.480s Neat, even SMB can beat nfs in default settings. This would then indicate to me that nfsd is broken somehow, but then we try again after only disabling ZIL. Solaris 10 6/06 : x4500 OpenSolaris svn117 DISABLE ZIL: nfsv4 0m8.453s 0m8.284s 0m8.264s Nice, so this is theoretically the fastest NFS speeds we can reach? We run postfix+dovecot for mail, which probably would be safe to not use ZIL. The other type is FTP/WWW/CGI, which has more active writes/updates. Probably not as good. Comments? Enable ZIL, but disable zfscache (Just as a test, I have been told disabling zfscache is far more dangerous). Solaris 10 6/06 : x4500 OpenSolaris svn117 DISABLE zfscacheflush: nfsv4 0m45.139s Interesting. Anyway, enable ZIL and zfscacheflush again, and learn a whole lot about slog. First I tried creating a 2G slog on the boot mirror: Solaris 10 6/06 : x4500 OpenSolaris svn117 slog boot pool: nfsv4 1m59.970s Some improvements. For a lark, I created a 2GB file in /tmp/ and changed the slog to that. (I know, having the slog in volatile RAM is pretty much the same as disabling ZIL. But it should give me theoretical maximum speed with ZIL enabled right?). Solaris 10 6/06 : x4500 OpenSolaris svn117 slog /tmp/junk: nfsv4 0m8.916s Nice! Same speed as ZIL disabled. Since this is a X4540, we thought we would test with a CF card attached. Alas the 600X (92MB/s) card are not out until next month, rats! So, we bought a 300X (40MB/s) card. Solaris 10 6/06 : x4500 OpenSolaris svn117 slog 300X CFFlash: nfsv4 0m26.566s Not too bad really. But you have to reboot to see a CF card, fiddle with BIOS for the boot order etc. Just not an easy add on a live system. A SATA emulated SSD DISK can be hot-swapped. Also, I learned an interesting lesson about rebooting with slog at /tmp/junk. I am hoping to pick up a SSD SATA device today and see what speeds we get out of that. That rsync (1s) vs nfs(8s) I can accept as over-head on a much more complicated protocol, but why would it take 3 minutes to write the same data on the same pool, with rsync(1s) vs nfs(3m)? The ZIL was on, slog is default, but both writing the same way. Does nfsd add FD_SYNC to every close regardless as to whether the application did or not? This I have not yet wrapped my head around. For example, I know rsyn