Re: device naming (was Re: Ethernet port setup part2)

Craig Sanders via luv-main Fri, 08 Sep 2017 05:23:49 -0700

On Thu, Sep 07, 2017 at 11:02:52PM -0700, Rick Moen wrote:
> > b) driver modules being loaded in a different order (same cause, different
> > incarnation) - this could be partially solved by listing the modules you
> > want loaded in /etc/modules, they'll load in the order listed (unless
> > something else triggers them being loaded earlier).
>
> FWIW, my own preference is to locally compile a kernel with needed drivers
> monolithically included and not building unneeded ones at all.  (On
> non-server systems, I do the same except compile as modules drivers I might
> reasonable expect to some day want but not initially.)
>
> I would be interested to know the circumstances in which 'modules loaded in
> a different order', specifically if any were _other than...


dunno what the fuss is, saying that modules don't always load in the order
you expect (even when you explicitly list them in /etc/modules) isn't even a
controversial statement. it's well-known modprobe behaviour. You have a lot of
control over load-order by listing the modules, but that control is not total.

**anything** that tries to use or probe for a device (whether directly or
indirectly) can trigger a kernel module to be loaded(*). much of that is in
kernel. some of it is userland, and depends, amongst other things, on the
execution order of init scripts (or unit files, or whatever).

(*) sometimes the wrong module, or the "wrong" one of two alternative modules
for the same hardware. nvidia.ko vs nouveau.ko, for example. and i've used
several different NICs and disk controllers and other things over the years
that had two alternative driver implementations in the mainline kernel at
the same time (sometimes because one was a newer shiner driver intended to
eventually replace the other. sometimes just because it was a different
implementation offering different options or tuning characteristics) that's
one of the reasons why modprobe has a 'blacklist' command.


As for compiling custom monolithic kernels, I gave up on that years ago. It
just wasn't worth the time and maintenance effort to custom-compile a kernel
for each machine.  My ability to remember and maintain the specific details of
each machine doesn't scale well enough...so i ration it to just remembering
quirks and broken things that need to be worked around or even particular
compile-time optimisations that significantly benefit one machine but not
others, not generic stuff like "needs driver X rather than driver Y compiled
in" when a generic kernel with all modules compiled works well enough for
that.

I can't even remember the last time i had a hardware bug or whatever that
**needed** a custom kernel to work around.


> That is, _of course_ adding/removing drives and controller cards may
> change device order.  When you do so, you expect that and expect to
> update one or two relevant system rc files.

or, for disks, I could just use UUID or LABEL (fstab) or /dev/disk/by-id
(elsewhere, including zpools) and not have to care in the slightest what
device node name the kernel gives it.

why make a problem for myself that I don't need to have?  especially when
that problem gives me no actual benefit of any kind?

> USB?  Yes, indeed, notorious agent of chaos that it is -- which is one
> of multiple reasons why you don't leave casual-use hotplugged
> mass-storage devices plugged in during system reboots, and why I'd be
> adverse to relying on USB-connected network interfaces if I had any
> alternative at all.
>
> So, to recap, unless you can (please!) detail instances where 'driver
> modules loaded in a different order' _without_ the above obvious and
> well-understood causative factors, I think you've just reiterated
> exactly what I said upthread.

what I said wasn't to dispute or disagree with what you said so yeah call it
reіteration if you want. i wæs providing some specific examples from my own
personal experience where devices were detected in different order across
reboots.


> I'll bet that the device node instability would vanish if you compile in
> the drivers monolithically.  That's what I'd try, anyway -- might put an
> end to that nonsense, and good riddance.

1. I could do that, but why would I want to? It's not causing me any problems
because I don't hardcode specific /dev/sdX* names into /etc/fstab or anywhere
else.  I follow the advice that has been stated repeatedly by kernel devs for
many years to not do that.

Really. The fact that the kernel device naming is not consistent does not
cause me even the slightest problem. It's a non-issue.

In other words "I don't care enough to find out why or change it because it
doesn't matter at all if i use UUIDs or LABELs or /dev/disk/by-*"

2. I'd bet that device name unpredictability wouldn't vanish because the
kernel doesn't guarantee that devices will get the same name on different
reboots. i.e. it is behaving as it is documented to behave.  That's a fact,
and not one of the alternative kind.



> > The SAS port drives aren't even detected in any predictable order.  All of
> > the 4TB ST4000DX drives (my "backup" zpool) are plugged into one SFF-8087
> > socket on the SAS card (which goes to one of my 4-drive hot swap bays),
> > and the 1TB WDs and STs are plugged into the other (which goes into
> > another 4-drive bay).  You'd expect them to be detected in that order,
> > but...nope.
>
> But (and my apologies if you clarify this; I'm a bit pressed for time), I'm
> betting that the devices within each _set_ of ports, the motherboard SATA
> set, the PCI-E SAS set, and the set of any block devices on USB, each are
> assigned devices contiguously.  So, see above.

you're responding to an example of devices within each set **NOT** being
assigned continguously.

one of the 4TB drives on the same SFF-8087 port as the others was detected as
/dev/sdc, while the other 4TB drives were detected as /dev/sdf to /dev/sdh.

here's how the kernel sees them when booting, sequentially numbered drives on
the zeroth SCSI-like controller (an LSI SAS card. the motherboard SATA ports
are scsi:1:x:x:x:)

Sep 05 23:13:34 ganesh kernel: scsi 0:0:0:0: Direct-Access     ATA      
ST31000528AS     CC49 PQ: 0 ANSI: 6
Sep 05 23:13:34 ganesh kernel: scsi 0:0:1:0: Direct-Access     ATA      WDC 
WD10EACS-00Z 1B01 PQ: 0 ANSI: 6
Sep 05 23:13:34 ganesh kernel: scsi 0:0:2:0: Direct-Access     ATA      WDC 
WD10EACS-00Z 1B01 PQ: 0 ANSI: 6
Sep 05 23:13:34 ganesh kernel: scsi 0:0:3:0: Direct-Access     ATA      WDC 
WD10EARS-00Y 0A80 PQ: 0 ANSI: 6
Sep 05 23:13:34 ganesh kernel: scsi 0:0:4:0: Direct-Access     ATA      
ST4000DX001-1CE1 CC44 PQ: 0 ANSI: 6
Sep 05 23:13:34 ganesh kernel: scsi 0:0:5:0: Direct-Access     ATA      
ST4000DX001-1CE1 CC44 PQ: 0 ANSI: 6
Sep 05 23:13:34 ganesh kernel: scsi 0:0:6:0: Direct-Access     ATA      
ST4000DX001-1CE1 CC44 PQ: 0 ANSI: 6
Sep 05 23:13:34 ganesh kernel: scsi 0:0:7:0: Direct-Access     ATA      
ST4000DX001-1CE1 CC44 PQ: 0 ANSI: 6

immediately after that, it assigns them the following device names:

Sep 05 23:13:34 ganesh kernel: sd 0:0:7:0: [sdc] 7814037168 512-byte logical 
blocks: (4.00 TB/3.64 TiB)
Sep 05 23:13:34 ganesh kernel: sd 0:0:4:0: [sdf] 7814037168 512-byte logical 
blocks: (4.00 TB/3.64 TiB)
Sep 05 23:13:34 ganesh kernel: sd 0:0:5:0: [sdg] 7814037168 512-byte logical 
blocks: (4.00 TB/3.64 TiB)
Sep 05 23:13:34 ganesh kernel: sd 0:0:1:0: [sdb] 1953525168 512-byte logical 
blocks: (1.00 TB/932 GiB)
Sep 05 23:13:34 ganesh kernel: sd 0:0:2:0: [sdd] 1953525168 512-byte logical 
blocks: (1.00 TB/932 GiB)
Sep 05 23:13:34 ganesh kernel: sd 0:0:0:0: [sda] 1953525168 512-byte logical 
blocks: (1.00 TB/932 GiB)
Sep 05 23:13:34 ganesh kernel: sd 0:0:3:0: [sde] 1953525168 512-byte logical 
blocks: (1.00 TB/932 GiB)
Sep 05 23:13:34 ganesh kernel: sd 0:0:6:0: [sdh] 7814037168 512-byte logical 
blocks: (4.00 TB/3.64 TiB)

i can only guess that drive spinup timing is why 0:0:0:7 was allocated sdc
rather than 0:0:0:2. or maybe it's the time of day, or phase of moon. i really
don't know.

but, like I said, it's not causing any problem so it doesn't matter.


BTW, the previous boot (trying out 4.12) assigned dev names to the drives in
the "natural" order, according to scsi device id:

Sep 05 09:16:25 ganesh kernel: sd 0:0:4:0: [sde] 7814037168 512-byte logical 
blocks: (4.00 TB/3.64 TiB)
Sep 05 09:16:25 ganesh kernel: sd 0:0:5:0: [sdf] 7814037168 512-byte logical 
blocks: (4.00 TB/3.64 TiB)
Sep 05 09:16:25 ganesh kernel: sd 0:0:6:0: [sdg] 7814037168 512-byte logical 
blocks: (4.00 TB/3.64 TiB)
Sep 05 09:16:25 ganesh kernel: sd 0:0:7:0: [sdh] 7814037168 512-byte logical 
blocks: (4.00 TB/3.64 TiB)
Sep 05 09:16:25 ganesh kernel: sd 0:0:2:0: [sdc] 1953525168 512-byte logical 
blocks: (1.00 TB/932 GiB)
Sep 05 09:16:25 ganesh kernel: sd 0:0:1:0: [sdb] 1953525168 512-byte logical 
blocks: (1.00 TB/932 GiB)
Sep 05 09:16:25 ganesh kernel: sd 0:0:0:0: [sda] 1953525168 512-byte logical 
blocks: (1.00 TB/932 GiB)
Sep 05 09:16:25 ganesh kernel: sd 0:0:3:0: [sdd] 1953525168 512-byte logical 
blocks: (1.00 TB/932 GiB)

so, there's a possible modern day example of a kernel version change causing
the kernel's device name for a drive to change - when nothing else changed,
the drive was plugged into the same port on the same controller in the same
PCI-e slot.



> Well, there you go talking about adding/removing drives, again -- and of
> course that changes device nodes.

adding and removing drives is a very normal thing to do. completely
unremarkable and not at all unusual.

> That's what updating /etc/fstab is for.

No, that's what using UUID= or LABEL= in /etc/fstab is for.

so that doing completely normal things like adding or removing drives (or
having a drive die unexpectedly) doesn't risk your files by mounting the wrong
block device on the wrong mount point.

Neither of these events should require /etc/fstab to be edited.  You can do
that if you want, but you're introducing a dependency on human interaction
into your boot sequence (and an opportunity for human error).


> > So, when the kernel devs say that you can't rely on the order of naming for
> > drives and other devices, I believe them.
>
> What kernel devs?

People who work on the kernel. google has been mostly unhelpful in locating a
specific quote from a well-known kernel dev, but device detection and naming
has been an issue in linux for as long as i can remember. It, along with the
fact that the /dev/MAKEDEV script basically sucked, is why there have been
several attempts in-kernel to solve it, including devfs, and later devtmpfs.

BTW, udev was first written ~2004 by kernel developer (and maintainer of the
kernel's driver core code) Greg Kroah-Hartman to replace/augment the features
of devfs - and while he's a systemd fan-boy these days, this was LONG before
systemd was even thought of.

http://www.linuxjournal.com/article/7316

    [...]

    Starting with the 2.5 kernel, all physical and virtual devices in a
    system are visible to user space in a hierarchal fashion through sysfs.
    /sbin/hotplug provides a notification to user space when any device is
    added or removed from the system. Using these two features, a user-space
    implementation of a dynamic /dev now is possible that can provide a
    flexible device naming policy.

    This article discusses udev, a program that replaces the functionality of
    devfs. It provides /dev entries for devices in the system at any moment
    in time. It also provides features previously unavailable through devfs
    alone, such as persistent naming for devices when they move around the
    device tree, a flexible device naming scheme, notification of external
    systems of device changes and moving all naming policy out of the kernel.


also, from fstab(5) with ** emphasis added by me:

   LABEL=<label> or UUID=<uuid> may be given instead of a device name.
   This is the recommended method, as device names are often a coincidence
   of hardware detection order, and can change when other disks are added
   or removed.  For example, `LABEL=Boot' or 
`UUID=3e6be9de-8139-11d1-9106-a43f08d823a6'.
   (Use a filesystem-specific tool like e2label(8), xfs_admin(8), or
   fatlabel(8) to set LABELs on filesystems).

   It's also possible to use PARTUUID= and PARTLABEL=. These partitions
   identifiers are supported for example for GUID Partition Table (GPT).

   See mount(8), blkid(8) or lsblk(8) for more details about device
   identifiers.


> Surely you aren't talking about the Freedesktop.org weenies.

of course not.



> I of course agree that device nodes for drives and other devices can
> change, but the question was:  Under which cirumstances?  What you've
> just described is pretty much exactly the situation I detailed upthread.

except for module load order, drive spin-up time, BIOS changes, and some other
factors.

the circumstances you mentioned are NOT an exhaustive list (and neither are
the ones I mentioned).  They're just **some** of the things that can affect
device naming, not all of them.

BTW, the actual point was NOT "under which circumstances" it can happen, so
don't try to shift the goal-posts.  The point I made was that it is crazy to
rely on devices being assigned any particular in any particular order by the
kernel because the kernel does not and can not guarantee that.


> > and, yeah, this is an unusual setup for a home system.  It's not all
> > that unusual for anyone running a file server for a business or other
> > organisation, or anyone who doesn't want to pay for a ridiculously
> > overpriced NAS box when they can DIY with linux's built-in features.
>
> I would never recommend for a business as file server with simultaneous use
> of motherboard SATA ports, a PCI-E SAS card, and USB things on an ongoing
> basis.  That seems like poor component selection, IMVAO. [0]

Well, that's a pointless distraction to make.

As mentioned, it's a home server & workstation.  Built gradually, and cheaply
(a few hundred dollars at a time rather than a few thousand or more for a
complete new build). From new and second-hand parts.  Upgraded many, many
times over the last few decades (trace it back far enough and it is still, in
the "my grandfather's axe" sense, the very first linux machine I ever built
back in 1990 or 91).

It uses, for example, both SAS and SATA ports because that's what I have and
I didn't need to spend any money to get another controller card.  It has SATA
drives on the SAS ports because they're a lot cheaper than SAS drives (or
anything else labelled "Enterprise")

Boot drive(s) on SATA, bulk-storage on SCSI or SAS controller isn't even
unusual for small-medium businesses.  I would have no hesitation recommending
this to any smallish business who wanted as much bang for their buck as
possible - a linux-based NAS (or freebsd) is far better (in every way) than
any consumer NAS box.

(in fact, i know of people who build file and other servers with the boot disk
on a USB stick because you can plug 2 of them into a motherboard's USB jumper
block with a trivial adaptor, and leave them in the case without using up a
valuable SATA or SAS port, or a drive bay)


It is what it is so that I can get some/most of the benefits of the high-end
gear I use at various $workplaces but cheap DIY(*) rather than spending
thousands or tens of thousands at once. And also so that I can experiment with
stuff that I may end up using at a $workplace.

(*) not necessarily worse than expensive name-brand hardware. In fact, a
lot of the DIY stuff is much better than any commercially available product.

> > > And ifrename is cool.
> >
> > It was cool. I installed it on every machine for several years. Then it
> > became unnecessary when the same capability (renaming NIC interfaces
> > according MAC address) was standard in udev.
>
> Or, to put it a different way, udev becomes unnecessary the moment you
> remember ifrename.

except that udev will **always* run before any NIC is up, while ifrename
may not - and will bail if the NIC is in use. which, IIRC, provided the
motivation for me to finally switch from ifrename to udev years after
udev had the capability.

there's also the fact that udev (or a work-in-progress clone like mdev) is
installed by default on almost every unix system these days. ifrename is not,
and may not even be packaged for some distros.

> > Given a choice between using a standard feature that's in every linux
> > system (well, possibly excluding some embedded linux devices) and using
> > a relatively obscure, "non-standard" tool that does the same thing, the
> > decision to switch to using udev for that was easy.
>
> MS-Windows is 'standard', too. ;->

it's non-standard for linux systems, so can be ignored as irrelevant.

> Personally, I rather like being in charge of my own software.  In fact, I
> rather insist, thanks-very-much-I'm-sure.

another non-sequitir. I can configure udev to do what I want, it's not even
particularly hard to do so (i've seen many config file formats that are much
worse) - how is using udev not "being in charge of my own software"? or my own
hardware or systems, which is what I guess you meant.

> > udev also has the advantage over ifrename of being useful for a lot more
> > device-related stuff than just NIC device renaming.
>
> It's a floor wax _and_ a dessert-topping!  ;-> [1]

udev is a single tool which can be used for a variety of device configuration
tasks. ifrename can only rename NICs.

> > devtmpfs doesn't solve the device order or naming problem.
>
> No, but it goes a long way towards eliminating the alleged necessity of
> udev, which I am pretty sure is what motivated Torvalds and co. to introduce
> it.  IIRC, it was after that notorious incident when Sievers attempted
> to strongarm kdbus into the kernel so that the systemd/dbus people could
> overwhelm the kernel with messge traffic.

Your recollection is faulty.

"The return of devfs"  https://lwn.net/Articles/331818/

devtmpfs was first proposed in 2009.  The Sievers kernel debug= cmdline arg
fiasco was in April 2014, and kdbus was announced at linux.conf.au in Jan 2014
(https://lwn.net/Articles/580194/).

What's really funny, though, is that the first devtmpfs patch was announced
in 2009 and written **BY** Greg Kroah-Hartman, Jan Blunck, and Kay
Sievers. That's definitely NOT a response to an incident five years later by
one of the authors.


(BTW, it was GKH who pushed so hard to get kdbus into the kernel. GKH is
still, rightfully, a highly-respected person in the kernel community.  Sievers
definitely isn't, and would have had no chance of pushing for anything.  I
paid a lot of attention to it at the time because it was, and remains, an
issue of concern to me)


> devtmpfs, among other things, was a statement that 'Actually, it turns
> out we don't need your code to recognise hardware and autoload firmware
> BLOBs, so pray don't motivate us to make even more of what you do
> irrelevant.'

Nope. devtmpfs was not one of the authors telling himself "we don't need my
code".

(Also, devtmpfs doesn't autload firmware blobs, that's done by the kernel
core and the driver - typically the driver asks the kernel for the firmware,
the kernel asks udev for it, udev finds it and hands it over, and the kernel
passes it on to the driver...which then uploads the firmware to the device)

Linux Torvalds yelled at Sievers (and rightly so) over the debug= incident and
subsequent arrogant arseholery, and told GKH that he'd be rejecting any future
code from Kay Sievers until his code and his attitude stopped sucking so much.

https://patchwork.kernel.org/patch/3930121/



BTW, now that I know Sievers was involved in devtmpfs, I've put it on my
list of things to be vaguely suspicious of.


> > > https://wiki.gentoo.org/wiki/Mdev
> > > https://github.com/slashbeast/mdev-like-a-boss
> >
> > whether it's called udev or mdev or some other clone of udev, it still does
> > the same thing.
>
> No, you are mistaken.  In _no way_ is mdev a clone of udev.  Not hardly.

It is intended to be a replacement for udev, it was written (at least
partially) in response to the fact that udev was merged into systemd and
was no longer maintained as a separate program.  I call that a clone.

if you like, call it a clone of udev's important features, as it was before
the deliberately engineered and unncessary interdependence with systemd and
gnome.


> I started losing interest in udev at a rapid pitch the day I found that
> the system no longer permitted me to use mknod to create a needed device
> node in /dev.  This is not tolerable, sorry:  Software that tries to
> tell the sysadmin he may not take necessary steps to administer his
> system gets scrapped at the next convenient opportunity.

when has it ever not been possible to run mknod in /dev?



re: CPUs and "management engines":


> Possible help may soon emerge from a new-ish initiative with Raptor
> Computing's Talos II series using IBM POWER9 CPUs, where there is,
> refreshingly, none of that shit anywhere in the SoC or surrounding
> circuitry.  More speculatively, the J-Core project has been reviving the
> Hitachi SuperH CPU architecture now that all of the patents have expired,
> with hardware designs that are open all the way from top to bottom.  Much
> depends on completion of their roadmap, particularly the 64-bit version of
> SH-4 & support circuitry.  We shall see.

cool. i'll put it on my TO INVESTIGATE list for 2027 or 2037

(the 64-bit version will be timed just right for avoiding the pending 32-bit
unix time_t apocalypse - aka "the Y2K bug for unix geeks")

Seriously, though, stuff like that is interesting and good to hear about
but I'm still waiting to see even ARM CPUs being useful for anything except
embedded devices like overproved NASes and wifi routers (and MIPS is still
dominant with openwrt-capable routers(*)), raspberry pi-like devices, and toys
like phones and tablets.

the long-touted rise of ARM-based server hardware has failed to materialise,
year after year.

in short, i'll believe it when i see it.


(*) I'll be researching these again over the next few months.  My suburb is
scheduled for NBN to (finally!) be available next March (using FTTC rather
than FTTP for bullshit Australian Libs vs Labor politics reasons and Emperor
Murdoch's control of the Libs, so I'll need a VDSL modem to replace my ADSL2
modem).  I don't trust vendor supplied firmware, and from what i've read most
VDSL modems will reset to ADSL-mode if you set them up as a bridge (for pppoe
on the local linux box), so I'll move my gateway/firewall over to an openwrt
box.  I may as well move dns, dhcpd, and a few other things too.

craig

--
craig sanders <[email protected]>
_______________________________________________
luv-main mailing list
[email protected]
https://lists.luv.asn.au/cgi-bin/mailman/listinfo/luv-main

Re: device naming (was Re: Ethernet port setup part2)

Reply via email to