On Nov 29, 2006, at 13:24, [EMAIL PROTECTED] wrote:
I suspect a lack of an MBR could cause some BIOS implementations to
barf ..
Why?
Zeroed disks don't have that issue either.
you're right - I was thinking that a lack of an MBR with a GPT could
be causing problems, but actually it looks like we do write a
protective MBR in efi_write() - so it's either going to be the GPT
header at LBA1 or backwards compatibility with the version 1.00 spec
that the BIOS vendors aren't dealing with correctly. Proprietary
BIOS RAID signatures does sound quite plausible as a common cause for
problems.
Digging a little deeper, I'm thinking some of our EFI code might be a
little old ..
in efi_partition.h we've got the following defined for dk_gpt and
dk_part:
161 /* Solaris library abstraction for EFI partitons */
162 typedef struct dk_part {
163 diskaddr_t p_start; /* starting LBA */
164 diskaddr_t p_size; /* size in blocks */
165 struct uuid p_guid; /* partion type GUID */
166 ushort_t p_tag; /* converted to part'n type
GUID */
167 ushort_t p_flag; /* attributes */
168 char p_name[EFI_PART_NAME_LEN]; /* partition name */
169 struct uuid p_uguid; /* unique partition GUID */
170 uint_t p_resv[8]; /* future use - set to zero */
171 } dk_part_t;
172
173 /* Solaris library abstraction for an EFI GPT */
174 #define EFI_VERSION102 0x00010002
175 #define EFI_VERSION100 0x00010000
176 #define EFI_VERSION_CURRENT EFI_VERSION100
177 typedef struct dk_gpt {
178 uint_t efi_version; /* set to EFI_VERSION_CURRENT */
179 uint_t efi_nparts; /* number of partitions below */
180 uint_t efi_part_size; /* size of each partition entry
*/
181 /* efi_part_size is unused */
182 uint_t efi_lbasize; /* size of block in bytes */
183 diskaddr_t efi_last_lba; /* last block on the disk */
184 diskaddr_t efi_first_u_lba; /* first block after labels */
185 diskaddr_t efi_last_u_lba; /* last block before backup
labels */
186 struct uuid efi_disk_uguid; /* unique disk GUID */
187 uint_t efi_flags;
188 uint_t efi_reserved[15]; /* future use - set to zero */
189 struct dk_part efi_parts[1]; /* array of partitions */
190 } dk_gpt_t;
which looks lke we're using the EFI Version 1.00 spec and looking at
cmd/zpool/zpool_vdev.c we call efi_write() which does the label and
writes the PMBR at LBA0 (first 512B block), the EFI header at LBA1
and should reserve the next 16KB for other partition tables .. [now
we really should be using EFI version 1.10 with the -001 addendum
(which is what 1.02 morphed into about 5 years back) or version 2.0
in the UEFI space .. but that's a separate discussion, as the address
boundaries haven't really changed for device labels.]
in uts/common/fs/zfs/vdev_label.c we define the zfs boot block
500
501 /*
502 * Initialize boot block header.
503 */
504 vb = zio_buf_alloc(sizeof (vdev_boot_header_t));
505 bzero(vb, sizeof (vdev_boot_header_t));
506 vb->vb_magic = VDEV_BOOT_MAGIC;
507 vb->vb_version = VDEV_BOOT_VERSION;
508 vb->vb_offset = VDEV_BOOT_OFFSET;
509 vb->vb_size = VDEV_BOOT_SIZE;
which gets written down at the 8KB boundary after we start usable
space from LBA34:
857 vtoc->efi_parts[0].p_start = vtoc->efi_first_u_lba;
[note: 17KB isn't typically well aligned for most logical volumes ..
it would probably be better to start writing data at LBA1024 so we
stay well aligned for logical volumes with stripe widths up to 512KB
and avoid the R/M/W misalignment that can occur there .. currently
with a 256KB vdev label, I believe we start the data portion out on
LBA546 which seems like a problem]
and then we apparently store a backup vtoc right before the backup
partition table entries and backup GPT:
858 vtoc->efi_parts[0].p_size = vtoc->efi_last_u_lba + 1 -
859 vtoc->efi_first_u_lba - resv;
this next bit is interesting since we should probably define a GUID
for ZFS partitions that points to the ZFS vdev label instead of using
V_USR
860
861 /*
862 * Why we use V_USR: V_BACKUP confuses users, and is
considered
863 * disposable by some EFI utilities (since EFI doesn't have
a backup
864 * slice). V_UNASSIGNED is supposed to be used only for
zero size
865 * partitions, and efi_write() will fail if we use it.
V_ROOT, V_BOOT,
866 * etc. were all pretty specific. V_USR is as close to
reality as we
867 * can get, in the absence of V_OTHER.
868 */
869 vtoc->efi_parts[0].p_tag = V_USR;
870 (void) strcpy(vtoc->efi_parts[0].p_name, "zfs");
and here we define the backup vdev label on the last usable LBA
before our standard 8MB(?) reservation (16384 blocks) at the end of
the disk and do the efi_write():
871
872 vtoc->efi_parts[8].p_start = vtoc->efi_last_u_lba + 1 - resv;
873 vtoc->efi_parts[8].p_size = resv;
874 vtoc->efi_parts[8].p_tag = V_RESERVED;
875
876 if (efi_write(fd, vtoc) != 0)
I'm thinking we should really define a GUID for ZFS and maybe do some
better provisioning at the front end of the disk to be better aligned
for full stripe write conditions .. with EFI we could use from LBA34
to LBA1023 for vdev labels and other stuff to start write aligning
out on LBA1024. There also looks like a error(?) in the EFI
reservation bits at the tail end of the disk since I thought the EFI
spec only needed 16KB for the backup partitions and 512B for the GPT
header .. not 16384 * 512B blocks .. for what it's worth that's also
been in the format utility for a while now, so I could be missing
something on the methodology for the 8MB reservation at the tail end
of the disk.
.je
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss