Re: [zfs-discuss] partioned cache devices

Andrew Gabriel Tue, 19 Mar 2013 14:10:40 -0700

On 03/19/13 20:27, Jim Klimov wrote:

I disagree; at least, I've always thought differently:
the "d" device is the whole disk denomination, with a
unique number for a particular controller link ("c+t").


The disk has some partitioning table, MBR or GPT/EFI.
In these tables, partition "p0" stands for the table
itself (i.e. to manage partitioning),


p0 is the whole disk regardless of any partitioning.
(Hence you can use p0 to access any type of partition table.)

and the rest kind
of "depends". In case of MBR tables, one partition may
be named as having a Solaris (or Solaris2) type, and
there it holds a SMI table of Solaris slices, and these
slices can hold legacy filesystems or components of ZFS
pools. In case of GPT, the GPT-partitions can be used
directly by ZFS. However, they are also denominated as
"slices" in ZFS and format utility.


The GPT partitioning spec requires the disk to be FDISK
partitioned with just one single FDISK partition of type EFI,
so that tools which predate GPT partitioning will still see
such a GPT disk as fully assigned to FDISK partitions, and
therefore less likely to be accidentally blown away.

I believe, Solaris-based OSes accessing a "p"-named
partition and an "s"-named slice of the same number
on a GPT disk should lead to the same range of bytes
on disk, but I am not really certain about this.


No, you'll see just p0 (whole disk), and p1 (whole disk
less space for the backwards compatible FDISK partitioning).

Also, if a "whole disk" is given to ZFS (and for OSes
other that the latest Solaris 11 this means non-rpool
disks), then ZFS labels the disk as GPT and defines a
partition for itself plus a small trailing partition
(likely to level out discrepancies with replacement
disks that might happen to be a few sectors too small).
In this case ZFS reports that it uses "cXtYdZ" as a
pool component,


For an EFI disk, the device name without a final p* or s*
component is the whole EFI partition. (It's actually the
s7 slice minor device node, but the s7 is dropped from
the device name to avoid the confusion we had with s2
on SMI labeled disks being the whole SMI partition.)

since it considers itself in charge
of the partitioning table and its inner contents, and
doesn't intend to share the disk with other usages
(dual-booting and other OSes' partitions, or SLOG and
L2ARC parts, etc). This also "allows" ZFS to influence
hardware-related choices, like caching and throttling,
and likely auto-expansion with the changed LUN sizes
by fixing up the partition table along the way, since
it assumes being 100% in charge of the disk.

I don't think there is a "crime" in trying to use the
partitions (of either kind) as ZFS leaf vdevs, even the
zpool(1M) manpage states that:

... The  following  virtual  devices  are supported:
      disk
        A block device, typically located under  /dev/dsk.
        ZFS  can  use  individual  slices  or  partitions,
        though the recommended mode of operation is to use
        whole  disks.  ...


Right.

This is orthogonal to the fact that there can only be
one Solaris slice table, inside one partition, on MBR.
AFAIK this is irrelevant on GPT/EFI - no SMI slices there.


There's a simpler way to think of it on x86.
You always have FDISK partitioning (p1, p2, p3, p4).
You can then have SMI or GPT/EFI slices (both called s0, s1, ...)
in an FDISK partition of the appropriate type.
With SMI labeling, s2 is by convention the whole Solaris FDISK
partition (although this is not enforced).
With EFI labeling, s7 is enforced as the whole EFI FDISK partition,
and so the trailing s7 is dropped off the device name for
clarity.

This simplicity is brought about because the GPT spec requires
that backwards compatible FDISK partitioning is included, but
with just 1 partition assigned.

--
Andrew
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] partioned cache devices

Reply via email to