Am 25.09.2023 um 13:58 schrieb Dimitry Andric:
# nvmecontrol identify nda0 and # nvmecontrol identify nvd0 (after
hw.nvme.use_nvd="1" and reboot) give the same result:
Number of LBA Formats: 1
Current LBA Format: LBA Format #00
LBA Format #00: Data Size: 512 Metadata Size: 0 Performance: Best
...
Optimal I/O Boundary: 0 blocks
NVM Capacity: 1000204886016 bytes
Preferred Write Granularity: 32 blocks
Preferred Write Alignment: 8 blocks
Preferred Deallocate Granul: 9600 blocks
Preferred Deallocate Align: 9600 blocks
Optimal Write Size: 256 blocks
My guess is that the "Preferred Write Granularity" is the optimal size, in this
case 32 'blocks' of 512 bytes, so 16 kiB. This also matches the stripe size reported by
geom, as you showed.
The "Preferred Write Alignment" is 8 * 512 = 4 kiB, so you should align
partitions etc to at least this. However, it cannot hurt to align everything to 16 kiB
either, which is an integer multiple of 4 kiB.
Eugene gave me a tip, so I looked into the drivers.
dev/nvme/nvme_ns.c:
nvme_ns_get_stripesize(struct nvme_namespace *ns)
{
uint32_t ss;
if (((ns->data.nsfeat >> NVME_NS_DATA_NSFEAT_NPVALID_SHIFT) &
NVME_NS_DATA_NSFEAT_NPVALID_MASK) != 0) {
ss = nvme_ns_get_sector_size(ns);
if (ns->data.npwa != 0)
return ((ns->data.npwa + 1) * ss);
else if (ns->data.npwg != 0)
return ((ns->data.npwg + 1) * ss);
}
return (ns->boundary);
}
cam/nvme/nvme_da.c:
if (((nsd->nsfeat >> NVME_NS_DATA_NSFEAT_NPVALID_SHIFT) &
NVME_NS_DATA_NSFEAT_NPVALID_MASK) != 0 && nsd->npwg != 0)
disk->d_stripesize = ((nsd->npwg + 1) *
disk->d_sectorsize);
else
disk->d_stripesize = nsd->noiob * disk->d_sectorsize;
So it seems, that nvd uses "sectorsize * Write Alignment" as stripesize
while nda uses "sectorsize * Write Granularity".
My current interpretation is, that the nvd driver reports the wrong
value for maximum performance and reliability. I should make a backup
and re-create the pool.
Maybe we should note in the 14.0 release notes, that the switch to nda
is not a "nop".
--
Frank Behrens
Osterwieck, Germany