Paul,
Thanks for additional data, please see comments inline.
Paul Archer wrote:
7:56pm, Victor Latushkin wrote:
While 'zdb -l /dev/dsk/c7d0s0' shows normal labels. So the new
question is: how do I tell ZFS to use c7d0s0 instead of c7d0? I can't
do a 'zpool replace' because the zpool isn't online.
ZFS actually uses c7d0s0 and not c7d0 - it shortens output to c7d0 in
case it controls entire disk. As before upgrade it looked like this:
NAME STATE READ WRITE CKSUM
datapool ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c2d0s0 ONLINE 0 0 0
c3d0s0 ONLINE 0 0 0
c4d0s0 ONLINE 0 0 0
c6d0s0 ONLINE 0 0 0
c5d0s0 ONLINE 0 0 0
I guess something happened to the labeling of disk c7d0 (used to be
c2d0) before, during or after upgrade.
It would be nice to show what zdb -l shows for this disk and some
other disk too. output of 'prtvtoc /dev/rdsk/cXdYs0' can be helpful too.
This is from c7d0:
--------------------------------------------
LABEL 0
--------------------------------------------
version=13
name='datapool'
state=0
txg=233478
pool_guid=3410059226836265661
hostid=519305
hostname='shebop'
top_guid=7679950824008134671
guid=17458733222130700355
vdev_tree
type='raidz'
id=0
guid=7679950824008134671
nparity=1
metaslab_array=23
metaslab_shift=32
ashift=9
asize=7501485178880
is_log=0
children[0]
type='disk'
id=0
guid=17458733222130700355
path='/dev/dsk/c7d0s0'
devid='id1,c...@asamsung_hd154ui=s1y6j1ks742049/a'
phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@0/i...@1/c...@0,0:a'
whole_disk=1
This is why ZFS does not show s0 in the zpool output for c7d0 - it
controls entire disk. I guess initially it was the other way - it is
unlikely that you specified disks differently at creation time and
earlier output suggests that it was other way. So somthing happened
before last system reboot that most likely relabeled your c7d0 disk, and
configuration in the labels was updated.
DTL=588
children[1]
type='disk'
id=1
guid=4735756507338772729
path='/dev/dsk/c8d0s0'
devid='id1,c...@asamsung_hd154ui=s1y6j1ks742050/a'
phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@1/i...@0/c...@0,0:a'
whole_disk=0
All the other disks have whole_disk=0, so there's s0 in the zpool output
for those disks.
DTL=467
children[2]
type='disk'
id=2
guid=10113358996255761229
path='/dev/dsk/c9d0s0'
devid='id1,c...@asamsung_hd154ui=s1y6j1ks742059/a'
phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@1/i...@1/c...@0,0:a'
whole_disk=0
DTL=573
children[3]
type='disk'
id=3
guid=11460855531791764612
path='/dev/dsk/c11d0s0'
devid='id1,c...@asamsung_hd154ui=s1y6j1ks742048/a'
phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@2/i...@1/c...@0,0:a'
whole_disk=0
DTL=571
children[4]
type='disk'
id=4
guid=14986691153111294171
path='/dev/dsk/c10d0s0'
devid='id1,c...@ast31500341as=____________9vs0ttwf/a'
phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@2/i...@0/c...@0,0:a'
whole_disk=0
DTL=473
Labels 1-3 are identical
The other disks in the pool give identical results (except for the
guid's, which match with what's above).
Ok, then let's look at the vtoc - probably we can find something
interesting there.
c8d0 - c11d0 are identical, so I didn't include that output below:
This is expected. So let's look for the differences:
r...@shebop:/tmp# prtvtoc /dev/rdsk/c7d0s0
* /dev/rdsk/c7d0s0 partition map
*
* Unallocated space:
* First Sector Last
* Sector Count Sector
* 34 222 255
*
* First Sector Last
* Partition Tag Flags Sector Count Sector Mount Directory
0 4 00 256 2930247391 2930247646
8 11 00 2930247647 16384 2930264030
r...@shebop:/tmp#
r...@shebop:/tmp# prtvtoc /dev/rdsk/c8d0s0
* /dev/rdsk/c8d0s0 partition map
*
* First Sector Last
* Partition Tag Flags Sector Count Sector Mount Directory
0 17 00 34 2930277101 2930277134
Now you can clearly see the difference between the two: 4 disks have
only one partition, whereas c7d0 has two and c7d0s0 is smaller than the
others.
Let's do a little bit of math.
asize for our RAID-Z vdev is 7501485178880; since it is asize of the
smallest vdev multiplied by 5, we can calculate what is expected size of
the smallest vdev - it is 7501485178880/5 = 1500297035776 bytes or
5723179 blocks of 256K bytes. Why is it interesting to have the size in
terms of 256KB block? Size of one label is 128KB, two labels take 256KB,
and ZFS currently rounds down slice size to nearest 256KB boundary for
further calculations. We need to add to that size size of two labels in
the front and in the back (4 in total) and 3.5MB reserved area right
after front labels - it is 4.5MB or 18 256KB blocks. So our expected
slice size is 5723179 + 18 = 5723197 256MB blocks.
Let's check actual slice sizes:
c8d0s0: 2930277101 / 512 = 5723197.462890625 - so it is exactly 5723197
256KB blocks
c7d0s0: 2930247391 / 512 = 5723139.435546875 - or 5723139 256KB blocks,
which is 58 25KB less then needed.
And this is the reason why RAID-Z is unhappy.
How to get out of this?
There are two ways:
1. Restore original labeling on the c7d0 disk
2. Remove (physically or logically) disk c7d0
Both ways have corresponding pro's and contra's.
If you want to try logical removal of c7d0 you can do something like this:
cfgadm -al
find disk c7d0 in the output (may look similar to c7::dsk/c7d0)
then do
cfgadm -c unconfigure c7::dsk/c7d0
If dynamic reconfiguration for your disks is not supported, you can
temporarily remove (or move it out of /dev/dsk and /dev/rdsk) these two
symbolic links:
/dev/dsk/c7d0s0
/dev/rdsk/c7d0s0
Then do
zpool import
to see what pools are available for import and if your RAID-Z is happier.
To configure disk back you can do 'cfgadm -c configure <disk>"
Cheers,
Victor
PS. do you have an idea how it could happen that c7d0 got relabeled?
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss