Re: [zfs-discuss] rename(2) (mv(1)) between ZFS filesystems in the same zpool
On Jan 2, 2008 11:46 AM, Darren Reed <[EMAIL PROTECTED]> wrote: > [EMAIL PROTECTED] wrote: > > ... > > That's a sad situation for backup utilities, by the way - a backup > > tool would have no way of finding out that file X on fs A already > > existed as file Z on fs B. So what ? If the file got copied, byte by > > byte, the same situation exists, the contents are identical. I don't > > think just because this makes backups slower than they could be if the > > backup utility were omniscient, that makes a reason to slow file > > copy/rename operations down. > > I don't see this as being a problem at all. > > This idea is aimed at being a filesystem performance optimisation, > not a backup optimisation. Anyway, the same "problem" already exists with cloned filesystems. -- Just me, Wire ... Blog: ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Setting a dataset create time only property at pool creation time.
Our test engineer for the ZFS Crypto project discovered that it isn't possible to enable encryption on the "top" filesystem in a pool - the one that gets created by default. The intent here is that the default top level filesystem gets the encryption property not the pool itself (because the later has no meaning). Looks like we need to do some work here. But note: # zpool create -o compression=on tpool /tmp/tpool property 'compression' is not a valid pool property For compression that doesn't matter too much since it can be set later. The problem we have is similar to this one: # zpool create -o normalization=formC tpool /tmp/tpool property 'normalization' is not a valid pool property # zpool create tpool /tmp/tpool # zfs set normalization=formC tpool cannot set property for 'tpool': 'normalization' is readonly like the normalization property we can only be set at create time. This looks like a generic problem with create time only properties. We need a way to pass any dataset property on via the zpool command to the top dataset so that create time only properties can be set. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Setting a dataset create time only property at pool creation time.
Hi I faced a similar problem when I was adding a property for per-dataset dnode sizes. I got around it by adding a ZPOOL_PROP_DNODE_SIZE and adding the dataset property in dsl_dataset_stats(). That way the root dataset gets the property too. I am not very sure if this is the cleanest solution or not since this is just prototype code. [EMAIL PROTECTED] src]# zpool create -d 4096 tank /mnt/store/zfs-fuse/large_dnode/src/img [EMAIL PROTECTED] src]# zfs get dnode_size NAME PROPERTYVALUE SOURCE tank dnode_size 4K - Thanks, Kalpak. On Wed, 2008-01-02 at 11:01 +, Darren J Moffat wrote: > Our test engineer for the ZFS Crypto project discovered that it isn't > possible to enable encryption on the "top" filesystem in a pool - the > one that gets created by default. > > The intent here is that the default top level filesystem gets the > encryption property not the pool itself (because the later has no > meaning). Looks like we need to do some work here. > > But note: > > # zpool create -o compression=on tpool /tmp/tpool > property 'compression' is not a valid pool property > > For compression that doesn't matter too much since it can be set later. > > The problem we have is similar to this one: > > # zpool create -o normalization=formC tpool /tmp/tpool > property 'normalization' is not a valid pool property > # zpool create tpool /tmp/tpool > # zfs set normalization=formC tpool > cannot set property for 'tpool': 'normalization' is readonly > > like the normalization property we can only be set at create time. > > This looks like a generic problem with create time only properties. We > need a way to pass any dataset property on via the zpool command to the > top dataset so that create time only properties can be set. > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool panic need help
Hi again in meantime I upgraded to s10u4 including recommended patches. Then I tried again to import the zpool with same behaviour. The stack dump is exactly the same as in previous message. to complete label print: # zdb -lv /dev/rdsk/c2t0d0s0 LABEL 0 version=2 name='mypool' state=0 txg=2080095 pool_guid=9190031050017369302 top_guid=1501452411577769624 guid=1501452411577769624 vdev_tree type='disk' id=0 guid=1501452411577769624 path='/dev/dsk/c2t0d0s0' devid='id1,[EMAIL PROTECTED]/a' whole_disk=1 metaslab_array=13 metaslab_shift=34 ashift=9 asize=2799984967680 LABEL 1 failed to unpack label 1 LABEL 2 failed to unpack label 2 LABEL 3 version=2 name='mypool' state=0 txg=2080095 pool_guid=9190031050017369302 top_guid=1501452411577769624 guid=1501452411577769624 vdev_tree type='disk' id=0 guid=1501452411577769624 path='/dev/dsk/c2t0d0s0' devid='id1,[EMAIL PROTECTED]/a' whole_disk=1 metaslab_array=13 metaslab_shift=34 ashift=9 asize=2799984967680 I learned, that it should have 4 identical labels where in this case 2 of them are corrupt. Is there a way to repair the labels? Any help greatly appreciated! Regards -Felix This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Setting a dataset create time only property at pool creation time.
Kalpak Shah wrote: > Hi > > I faced a similar problem when I was adding a property for per-dataset dnode > sizes. I got around it by adding a ZPOOL_PROP_DNODE_SIZE and adding the > dataset property in dsl_dataset_stats(). That way the root dataset gets the > property too. I am not very sure if this is the cleanest solution or not > since this is just prototype code. I think a generic solution is needed particular given there is a need for this with all create time only properties. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [zones-discuss] ZFS shared /home between zones
James C. McPherson wrote: > You can definitely loopback mount the same fs into multiple > zones, and as far as I can see you don't have the multiple-writer > issues that otherwise require Qfs to solve - since you're operating > within just one kernel instance. Is there any significant performance impact with loopback mounts? - Bob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Adding to zpool: would failure of one device destroy all data?
I didn't find any clear answer in the documentation, so here it goes: I've got a 4-device RAIDZ array in a pool. I then add another RAIDZ array to the pool. If one of the arrays fails, would all the data on the array be lost, or would it be like disc spanning, and only the data on the failed array be lost? Thanks in advance. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Adding to zpool: would failure of one device destroy all data?
Your data will be striped across both vdevs after you add the 2nd vdev. In any case, failure of one stripe device will result in the loss of the entire pool. I'm not sure, however, if there is anyway vm recover any data from surviving vdevs. On 1/2/08, Austin <[EMAIL PROTECTED]> wrote: > I didn't find any clear answer in the documentation, so here it goes: > > I've got a 4-device RAIDZ array in a pool. I then add another RAIDZ array > to the pool. If one of the arrays fails, would all the data on the array be > lost, or would it be like disc spanning, and only the data on the failed > array be lost? > > Thanks in advance. > > > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > -- Sent from Gmail for mobile | mobile.google.com Just me, Wire ... Blog: ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Inode (dnode) numbers (Re: rename(2) (mv(1)) between ZFS filesystems in the same zpool)
On Mon, Dec 31, 2007 at 07:20:30PM +1100, Darren Reed wrote: > Frank Hofmann wrote: > > http://www.opengroup.org/onlinepubs/009695399/functions/rename.html > > > > ERRORS > > The rename() function shall fail if: > > [ ... ] > > [EXDEV] > > [CX] The links named by new and old are on different file systems > > and the > > implementation does not support links between file systems. > > > > Hence, it's implementation-dependent, as per IEEE1003.1. > > This implies that we'd also have to look at allowing > link(2) to also function between filesystems where > rename(2) was going to work without doing a copy, > correct? Which I suppose makes sense. If so then a cross-dataset rename(2) won't necessarily work. link(2) preserves inode numbers. mv(1) does not [when crossing devices]. A cross-dataset rename(2) may not be able to preserve inode numbers either (e.g., if the one at the source is already in use on the target). Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Inode (dnode) numbers (Re: rename(2) (mv(1)) between ZFS filesystems in the same zpool)
Oof, I see this has been discussed since (and, actually, IIRC it was discussed a long time ago too). Anyways, IMO, this requires a new syscall or syscalls: xdevrename(2) xdevcopy(2) and then mv(1) can do: if (rename(old, new) != 0) { if (xdevrename(old, new) != 0) { /* do a cp(1) instead */ return (do_cp(old, new)); } return (0); } return (0); cp(1), and maybe ln(1), could do something similar using xdevcopy(2). Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Help! ZFS pool is UNAVAILABLE
I AM NOT A ZFS DEVELOPER. These suggestions "should" work, but there may be other people who have better ideas. Aaron Berland wrote: > Basically, I have a 3 drive raidz array on internal Seagate > drives. running build 64nv. I purchased 3 add'l USB drives > with the intention of mirroring and then migrating the data > to the new USB drives. (snip) > Below is my current zpool status. Note the USB drives are > showing up as the same device. They are plugged into 3 > different port and they used to show up as different controllers?? > > This whole thing was supposed to duplicate my data and have > more redundancy, but now it looks like I could be loosing it > all?! I have some data backed up on other devices, but not all. > > NAMESTATE READ WRITE CKSUM > zbk UNAVAIL 0 0 0 insufficient replicas > raidz1ONLINE 0 0 0 > c2d0p2 ONLINE 0 0 0 > c1d0ONLINE 0 0 0 > c1d1ONLINE 0 0 0 > raidz1UNAVAIL 0 0 0 insufficient replicas > c5t0d0 ONLINE 0 0 0 > c5t0d0 FAULTED 0 0 0 corrupted data > c5t0d0 FAULTED 0 0 0 corrupted data Ok, from here, we can see that you have a single pool, with two striped components: a raidz set from c1 and c2 disks, and the (presumably new) raidz set from c5 -- I'm guessing this is where the USB disks show up. Unfortunately, it is not possible to remove a component from a zfs pool. On the bright side, it might be possible to trick it, at least for long enough to get the data back. First, we'll want to get the system booted. You'll connect the USB devices, but DON't try to do anything with your pool (especially don't put more data on it) You should then be able to get a consistant pool up and running -- the devices will be scanned and detected and automatically reenabled. You might have to do a "zpool import" to search all of the /dev/dsk/ devices. >From there, pull out one of the USB drives and do a "zpool scrub" to resilver the failed RAID group. So now, wipe off the removed USB disks (format it with ufs or something... it just needs to lose the ZFS identifiers. And while we're at it, ufs is probably a good choice anyway, given the next step(s)) One of the disks will show FAULTED at this point, I'll call it c5t2d0. Now, mount up that extra disk, and run "mkfile 500g /mnt/theUSBdisk/disk1.img" (This will create a sparse file) Then do a "zfs replace c5t2d0 /mnt/theUSBdisk/disk1.img" Then you can also replace the other 2 USB disks with other img files too... as long as the total data written to these stripes doesn't exceed the actual size of the disk, you'll be OK. At this point, back up your data (zfs send | bzip2 -9 > /mnt/theUSBdisk/backup.dat). --Joe ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Help! ZFS pool is UNAVAILABLE
Moore, Joe wrote: > I AM NOT A ZFS DEVELOPER. These suggestions "should" work, but there > may be other people who have better ideas. > > Aaron Berland wrote: > >> Basically, I have a 3 drive raidz array on internal Seagate >> drives. running build 64nv. I purchased 3 add'l USB drives >> with the intention of mirroring and then migrating the data >> to the new USB drives. >> > (snip) > >> Below is my current zpool status. Note the USB drives are >> showing up as the same device. They are plugged into 3 >> different port and they used to show up as different controllers?? >> >> This whole thing was supposed to duplicate my data and have >> more redundancy, but now it looks like I could be loosing it >> all?! I have some data backed up on other devices, but not all. >> >> NAMESTATE READ WRITE CKSUM >> zbk UNAVAIL 0 0 0 insufficient replicas >> raidz1ONLINE 0 0 0 >> c2d0p2 ONLINE 0 0 0 >> c1d0ONLINE 0 0 0 >> c1d1ONLINE 0 0 0 >> raidz1UNAVAIL 0 0 0 insufficient replicas >> c5t0d0 ONLINE 0 0 0 >> c5t0d0 FAULTED 0 0 0 corrupted data >> c5t0d0 FAULTED 0 0 0 corrupted data >> > > Ok, from here, we can see that you have a single pool, with two striped > components: a raidz set from c1 and c2 disks, and the (presumably new) > raidz set from c5 -- I'm guessing this is where the USB disks show up. > > Unfortunately, it is not possible to remove a component from a zfs pool. > > On the bright side, it might be possible to trick it, at least for long > enough to get the data back. > > First, we'll want to get the system booted. You'll connect the USB > devices, but DON't try to do anything with your pool (especially don't > put more data on it) > > You should then be able to get a consistant pool up and running -- the > devices will be scanned and detected and automatically reenabled. You > might have to do a "zpool import" to search all of the /dev/dsk/ > devices. > > >From there, pull out one of the USB drives and do a "zpool scrub" to > resilver the failed RAID group. So now, wipe off the removed USB disks > (format it with ufs or something... it just needs to lose the ZFS > identifiers. And while we're at it, ufs is probably a good choice > anyway, given the next step(s)) One of the disks will show FAULTED at > this point, I'll call it c5t2d0. > > Now, mount up that extra disk, and run "mkfile 500g > /mnt/theUSBdisk/disk1.img" (This will create a sparse file) > Be careful here, if your USB disk is smaller than 500g (likely) then you won't be able to later replace this disk1.img file with a smaller USB disk. You will need to make sure the disk1.img file is the same size as the USB disk. Since USB disks are often different sizes (!), this might get tricky. [yes, this would be fixed by the notorious shrink RFE] -- richard > Then do a "zfs replace c5t2d0 /mnt/theUSBdisk/disk1.img" > > Then you can also replace the other 2 USB disks with other img files > too... as long as the total data written to these stripes doesn't exceed > the actual size of the disk, you'll be OK. At this point, back up your > data (zfs send | bzip2 -9 > /mnt/theUSBdisk/backup.dat). > > --Joe > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Help! ZFS pool is UNAVAILABLE
Hi Joe, Thanks for trying. I can't even get the pool online because there are 2 corrupt drives according to zpool status. Yours and the other gentlemen's insights have been very helpful, however! I lucked out and realized that I did have copies of 90% of my data, so I am just going to destroy the pool and start over. I will have more questions in the future on how to best safeguard my pools. Thanks again for your help! Aaron This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] hot spare and resilvering problem
On Dec 25, 2007 3:19 AM, Maciej Olchowik <[EMAIL PROTECTED]> wrote: > Hi Folks, > > I have 3510 disk array connected to T2000 server running: > SunOS 5.10 Generic_118833-33 sun4v sparc SUNW,Sun-Fire-T200 > 12 disks (300G each) is exported from array and ZFS is used > to manage them (raidz with one hot spare). > > Few days ago we had a disk failure. First problem was that hot > spare hasn't automatically kicked in, so I have run zpool replace > manually - resilvering started. > > It's now 4th day and it's still not finished. Also when I run zpool > status yesterday it was 56% complete, today it's only 13%. > > This is our production machine and we really can't afford this > service to be slow any longer. Please could someone shed some > light on why resilvering takes so long (and restarts) ? > > Is there a patch we can apply to fix it ? > > many thanks for any info, > > Maciej Do you have snapshots taking place (like in a cron job) during the resilver process? If so, you may be hitting a bug that the resilver will restart from the beginning whenever a new snapshot occurs. If you disable the snapshots during the resilver then it should complete to 100%. -Eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [zones-discuss] ZFS shared /home between zones
Bob Scheifler wrote: > James C. McPherson wrote: >> You can definitely loopback mount the same fs into multiple >> zones, and as far as I can see you don't have the multiple-writer >> issues that otherwise require Qfs to solve - since you're operating >> within just one kernel instance. > > Is there any significant performance impact with loopback mounts? Not that I have come across. I've got three zones (global + 2 others) running permanently. My home directory is constantly mounted on two of them, and periodically on the third. Both the non-global zones have several loopback-mounted filesystems in very heavy use and at least from my point of view the performance has been quite good. James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] What are the dates ls shows on a snapshot?
On Dec 23, 2007, at 7:53 PM, David Dyer-Bennet wrote: > Just out of curiosity, what are the dates ls -l shows on a snapshot? > Looks like they might be the pool creation date. The ctime and mtime are from the file system creation date. The atime is the current time. See: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/ common/fs/zfs/zfs_ctldir.c#204 http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/ common/fs/zfs/zfs_ctldir.c#302 eric > > bash-3.2$ ls -l /home/.zfs/snapshot/ > total 15 > drwxr-xr-x 9 root sys9 Sep 29 2006 > 20071126-2328-first-post-move > drwxr-xr-x 9 root sys9 Sep 29 2006 20071127-2255- > tp-moved > drwxr-xr-x 9 root sys9 Sep 29 2006 20071130-2230 > drwxr-xr-x 9 root sys9 Sep 29 2006 20071206-2148 > drwxr-xr-x 9 root sys9 Sep 29 2006 20071214-2147 > > (Those snapshots were created on the dates in their names, somewhere > near the times.) > > -- > David Dyer-Bennet, [EMAIL PROTECTED]; http://dd-b.net/ > Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ > Photos: http://dd-b.net/photography/gallery/ > Dragaera: http://dragaera.info > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Inode (dnode) numbers (Re: rename(2) (mv(1)) between ZFS filesystems in the same zpool)
On Jan 3, 2008 12:32 AM, Nicolas Williams <[EMAIL PROTECTED]> wrote: > Oof, I see this has been discussed since (and, actually, IIRC it was > discussed a long time ago too). > > Anyways, IMO, this requires a new syscall or syscalls: > > xdevrename(2) > xdevcopy(2) > > and then mv(1) can do: > > if (rename(old, new) != 0) { > if (xdevrename(old, new) != 0) { > /* do a cp(1) instead */ > return (do_cp(old, new)); > } > return (0); > } > return (0); > > cp(1), and maybe ln(1), could do something similar using xdevcopy(2). Could it be cleaner to do that within vn_renameat() instead? This will save creating a new syscall and updating quite a number of utilities. -- Just me, Wire ... Blog: ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Inode (dnode) numbers (Re: rename(2) (mv(1)) between ZFS filesystems in the same zpool)
Nicolas Williams wrote: > On Mon, Dec 31, 2007 at 07:20:30PM +1100, Darren Reed wrote: > >> Frank Hofmann wrote: >> >>> http://www.opengroup.org/onlinepubs/009695399/functions/rename.html >>> >>> ERRORS >>> The rename() function shall fail if: >>> [ ... ] >>> [EXDEV] >>> [CX] The links named by new and old are on different file systems >>> and the >>> implementation does not support links between file systems. >>> >>> Hence, it's implementation-dependent, as per IEEE1003.1. >>> >> This implies that we'd also have to look at allowing >> link(2) to also function between filesystems where >> rename(2) was going to work without doing a copy, >> correct? Which I suppose makes sense. >> > > If so then a cross-dataset rename(2) won't necessarily work. > > link(2) preserves inode numbers. mv(1) does not [when crossing > devices]. A cross-dataset rename(2) may not be able to preserve inode > numbers either (e.g., if the one at the source is already in use on the > target). Unless POSIX or similar says the preservation of inode numbers is required, I can't see why that is important. Darren ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss