[zfs-discuss] ZFS Import Problem
Hi everyone, I have a serious problem and need some assistance. I was doing a rolling upgrade of a raidz1, replacing 320GB drives with 1.5TB drives (ie. zpool replace). I had replaced three of the drives and they had resilvered without errors and then I started on the fourth one. It went downhill from there... while the fourth disk was resilvering, one of the others started throwing errors... lots and lots of them... to the point that the system is unusable. So, what I have are the four original 320GB drives that I would like to bring back online (on another server). The pool was unmounted when I was replacing the old units with the new ones, so the data should have been static and I would expect to be able to bring the pool online on another server. The problem I am running into is that the pool seems to be remembering the 1.5TB drive names/locations/etc. So, instead of using the new controller locations in the new system, the pool is insisting on looking for the old devices. For example: In the new system, the drives are c5d0 c5d1 c6d0 c6d1 -- in the old system, the drives were c2d0s0 c3d0s0 c4d0s0 c5d0s0. I have tried everything I can come up with to get the pool to import, but it just wont do it. I need to know how to get zpool to simply import them as I specify the new devices. When I try it, it complains that the pool doesn't exist... that's because it wont let me import it! I'm stuck and I need help. I have tried import -f, import -d, import -f -d ... nothing works. I have a lot of data on those drives that I couldn't back up because I don't have a backup system big enough to handle it all. I'd really like to salvage that data if at all possible. I was trusting ZFS to handle it and until now, it did an excellent job! It seems pretty simple... shouldn't each drive be labeled as part of a pool? When I do a zdb -l, I see the labels and pool identifiers, but I also see the old device names hard-coded into the label. How can I get zpool to simply see that all four drives are members of the same pool and build new device locations? Any help would be greatly appreciated. Thanks to you all in advance, -Michael -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Import Problem
> I'm not an expert but for what it's worth- > > 1. Try the original system. It might be a fluke/bad > cable or anything else intermittent. I've seen it > happen here. If so, your pool may be alright. > > 2. For the (defunct) originals, I'd say we'd need to > take a look into the sources to find if something > needs to be done. AFAIK, device paths aren't > hard-coded. ZFS doesn't care where the disks are as > long as it finds them and they contain the right > label. I tried the original system and it had much the same reaction. The cables, etc. are all fine. The new system sees the drives and they check out in drive-testing utilities. I don't think we're dealing with a hardware issue. I agree that the problem is most likely the labels. When I look at zdb -l output for each of the drives, I can see that they all show the correct pool name and numeric identifier. I think the problem is that they have "children" defined that no longer exist -- at least not at the locations indicated in the label. The question is... how do update the labels so the pool members are all reflected with their new identifications? Thanks, Michael -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Import Problem
--- On Tue, 12/30/08, Andrew Gabriel wrote: >If you were doing a rolling upgrade, I suspect the old disks are all >horribly out of sync with each other? > >If that is the problem, then if the filesystem(s) have a snapshot that >existed when all the old disks were still online, I wonder if it might >be possible to roll them back to it by hand, so it looks like the >current live filesystem? I don't know if this is possible with zdb. What I meant is that I rolled the new drives in one at a time... replace a drive, let it resilver, replace another, let it resilver. etc. The filesystems were not mounted at the time so the data should have been static. The only thing that changed (from what I can tell) is the zpool labels. Thanks, Michael ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Import Problem
Yes, but I can't export a pool that has never been imported. These drives are no longer connected to their original system, and at this point, when I connect them to their original system, the results are the same. Thanks, Michael --- On Tue, 12/30/08, Weldon S Godfrey 3 wrote: > > Did you try zpool export 1st? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS snapshot splitting & joining
Hello everyone, I am trying to take ZFS snapshots (ie. zfs send) and burn them to DVD's for offsite storage. In many cases, the snapshots greatly exceed the 8GB I can stuff onto a single DVD-DL. In order to make this work, I have used the "split" utility to break the images into smaller, fixed-size chunks that will fit onto a DVD. For example: #split -b8100m ./mypictures.zfssnap mypictures.zfssnap.split. This gives me a set of files like this: 7.9G mypictures.zfssnap.split.aa 7.9G mypictures.zfssnap.split.ab 7.9G mypictures.zfssnap.split.ac 7.9G mypictures.zfssnap.split.ad 7.9G mypictures.zfssnap.split.ae 7.9G mypictures.zfssnap.split.af 6.1G mypictures.zfssnap.split.ag I use the following command to convert them back into a single file: #cat mypictures.zfssnap.split.a[a-g] > testjoin But when I compare the checksum of the original snapshot to that of the rejoined snapshot, I get a different result: #cksum 2008.12.31-2358--pictures.zfssnap 308335278 57499302592 mypictures.zfssnap #cksum testjoin 278036498 57499302592 testjoin And when I try to restore the filesystem, I get the following failure: #zfs recv pool_01/test < ./testjoin cannot receive new filesystem stream: invalid stream (checksum mismatch) Which makes sense given the different checksums reported by the cksum command above. The question I have is, what can I do? My guess is that there is some ascii/binary conversion issue that the "split" and "cat" commands are introducing into the restored file, but I'm at a loss as to exactly what is happening and how to get around it. If anyone out there has a solution to my problem, or a better suggestion on how to accomplish the original goal, please let me know. Thanks to all in advance for any help you may be able to offer. -Michael -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS snapshot splitting & joining
Hi everyone, I appreciate the discussion on the practicality of archiving ZFS sends, but right now I don't know of any other options. I'm a home user, so Enterprise-level solutions aren't available and as far as I know, tar, cpio, etc. don't capture ACL's and other low-level filesystem attributes. Plus, they are all susceptible to corruption while in storage, making recovery no more likely than with a zfs send. The checksumming capability is a key factor to me. I would rather not be able to restore the data than to unknowingly restore bad data. This is the biggest reason I started using ZFS to start with. Too many cases of "invisible" file corruption. Admittedly, it would be nicer if "zfs recv" would flag individual files with checksum problems rather than completely failing the restore. What I need is a complete snapshot of the filesystem (ie. ufsdump) and, correct me if I'm wrong, but zfs send/recv is the closest (only) thing we have. And I need to be able to break up this complete snapshot into pieces small enough to fit onto a DVD-DL. So far, using ZFS send/recv works great as long as the files aren't split. I have seen suggestions on using something like 7z (?) instead of "split" as an option. Does anyone else have any other ideas on how to successfully break up a send file and join it back together? Thanks again, Michael -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS snapshot splitting & joining
Hi again everyone, OK... I'm even more confused at what is happening here when I try to rejoin the split zfs send file... When I cat the split files and pipe through cksum, I get the same cksum as the original (unsplit) zfs send snapshot: #cat mypictures.zfssnap.split.a[a-d] |cksum 2375397256 27601696744 #cksum mypictures.zfssnap 2375397256 27601696744 But when I cat them into a file and then run cksum on the file, it results in a different cksum: #cat mypictures.zfssnap.split.a[a-d] > testjoin3 #cksum testjoin3 3408767053 27601696744 testjoin3 I am at a loss as to what on Earth is happening here! The resulting file size is the same as the original, but why does cat produce a different cksum when piped vs. directed to a file? In each case where I have run 'cmp -l' on the resulting file, there is a single byte that has the wrong value. What could cause this? Any ideas would be greatly appreciated. Thanks (again) to all in advance, -Michael -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS snapshot splitting & joining
Thanks to John K. and Richard E. for an answer that would have never, ever occurred to me... The problem was with the shell. For whatever reason, /usr/bin/ksh can't rejoin the files correctly. When I switched to /sbin/sh, the rejoin worked fine, the cksum's matched, and the zfs recv worked without a hitch. The ksh I was using is: # what /usr/bin/ksh /usr/bin/ksh: Version M-11/16/88i SunOS 5.10 Generic 118873-04 Aug 2006 So, is this a bug in the ksh included with Solaris 10? Should I file a bug report with Sun? If so, how? I don't have a support contract or anything. Anyway, I'd like to thank you all for your valuable input and assistance in helping me work through this issue. -Michael -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Partial ZFS Space Recovery
Hi everyone, I have what I think is a simple question, but the answer is eluding me... I have a ZFS filesystem in which I needed to move part of it to a new pool. I want to recover the space from the part I moved so that it returns to the original pool, without losing the snapshot data of the other parts of the filesystem. For example, I have... pool_01/mydata/dir1 and within dir1, I have: ./images ./invoices ./xrays The xrays dir was moved to a new pool and now I want to recover the space that it once took up. The problem is that I cant seem to figure out how to recover just the space that one directory took up. I know how to destroy the snapshots, but I need to keep them since they contain data from other directories, etc. I even tried removing the xrays directory from the .zfs/snapshot areas, but those are read-only and it wouldn't let me. The system is Solaris 10 x86 08/07. Any help would be appreciated. Thank you, Michael This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Suggestion/Request: ZFS-aware rm command
Hello again, Some of you may have read my earlier post about wanting to recover partial ZFS space. It seems as though that isn't possible given the current implementation... so, I would like to suggest the following "enhancement": A zfs-aware rm command (ie. zfs rm). The idea here is that we would be able to remove a file (or directory) from a zfs filesystem and all snapshots that the file exists in. This would allow the ability to recover space when part of a zfs pool is moved/deleted, but the bulk of the data in the snapshots is still relevant. I know, I know -- snapshots are there to protect us from messing up, but a specific command that would allow us to "force" the removal (unlinking) of certain structures within the filesystem and its associated snapshots would be quite useful. Take the example that bit me... I had a filesystem that had a subdirectory that grew too big and had to be moved to another pool. Since the snapshots contained all of that data, even though the directory was moved, I was unable to recover the space (almost 300GB) without deleting all of the snapshots. The problem with deleting all of the snapshots is that I would lose the ability to recover the other data within that filesystem. The problem with "sending" the snapshots elsewhere before deleting them is that at almost 500GB each, I simply didn't have that kind of space available. If I had the ability to forcefully delete the directory from the filesystem and its snapshots, I would have been able to move my data around without sacrificing the recoverability of the other data. Maybe something like this: zfs rm -f myfile Seems like this would be pretty easy if we are really just talking about unlinking pointers to the specific data, but I'll let those more intimate with the code speak to that. Thanks, Michael This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss