On Thu, Sep 07, 2017 at 11:02:52PM -0700, Rick Moen wrote: > > b) driver modules being loaded in a different order (same cause, different > > incarnation) - this could be partially solved by listing the modules you > > want loaded in /etc/modules, they'll load in the order listed (unless > > something else triggers them being loaded earlier). > > FWIW, my own preference is to locally compile a kernel with needed drivers > monolithically included and not building unneeded ones at all. (On > non-server systems, I do the same except compile as modules drivers I might > reasonable expect to some day want but not initially.) > > I would be interested to know the circumstances in which 'modules loaded in > a different order', specifically if any were _other than...
dunno what the fuss is, saying that modules don't always load in the order you expect (even when you explicitly list them in /etc/modules) isn't even a controversial statement. it's well-known modprobe behaviour. You have a lot of control over load-order by listing the modules, but that control is not total. **anything** that tries to use or probe for a device (whether directly or indirectly) can trigger a kernel module to be loaded(*). much of that is in kernel. some of it is userland, and depends, amongst other things, on the execution order of init scripts (or unit files, or whatever). (*) sometimes the wrong module, or the "wrong" one of two alternative modules for the same hardware. nvidia.ko vs nouveau.ko, for example. and i've used several different NICs and disk controllers and other things over the years that had two alternative driver implementations in the mainline kernel at the same time (sometimes because one was a newer shiner driver intended to eventually replace the other. sometimes just because it was a different implementation offering different options or tuning characteristics) that's one of the reasons why modprobe has a 'blacklist' command. As for compiling custom monolithic kernels, I gave up on that years ago. It just wasn't worth the time and maintenance effort to custom-compile a kernel for each machine. My ability to remember and maintain the specific details of each machine doesn't scale well enough...so i ration it to just remembering quirks and broken things that need to be worked around or even particular compile-time optimisations that significantly benefit one machine but not others, not generic stuff like "needs driver X rather than driver Y compiled in" when a generic kernel with all modules compiled works well enough for that. I can't even remember the last time i had a hardware bug or whatever that **needed** a custom kernel to work around. > That is, _of course_ adding/removing drives and controller cards may > change device order. When you do so, you expect that and expect to > update one or two relevant system rc files. or, for disks, I could just use UUID or LABEL (fstab) or /dev/disk/by-id (elsewhere, including zpools) and not have to care in the slightest what device node name the kernel gives it. why make a problem for myself that I don't need to have? especially when that problem gives me no actual benefit of any kind? > USB? Yes, indeed, notorious agent of chaos that it is -- which is one > of multiple reasons why you don't leave casual-use hotplugged > mass-storage devices plugged in during system reboots, and why I'd be > adverse to relying on USB-connected network interfaces if I had any > alternative at all. > > So, to recap, unless you can (please!) detail instances where 'driver > modules loaded in a different order' _without_ the above obvious and > well-understood causative factors, I think you've just reiterated > exactly what I said upthread. what I said wasn't to dispute or disagree with what you said so yeah call it reіteration if you want. i wæs providing some specific examples from my own personal experience where devices were detected in different order across reboots. > I'll bet that the device node instability would vanish if you compile in > the drivers monolithically. That's what I'd try, anyway -- might put an > end to that nonsense, and good riddance. 1. I could do that, but why would I want to? It's not causing me any problems because I don't hardcode specific /dev/sdX* names into /etc/fstab or anywhere else. I follow the advice that has been stated repeatedly by kernel devs for many years to not do that. Really. The fact that the kernel device naming is not consistent does not cause me even the slightest problem. It's a non-issue. In other words "I don't care enough to find out why or change it because it doesn't matter at all if i use UUIDs or LABELs or /dev/disk/by-*" 2. I'd bet that device name unpredictability wouldn't vanish because the kernel doesn't guarantee that devices will get the same name on different reboots. i.e. it is behaving as it is documented to behave. That's a fact, and not one of the alternative kind. > > The SAS port drives aren't even detected in any predictable order. All of > > the 4TB ST4000DX drives (my "backup" zpool) are plugged into one SFF-8087 > > socket on the SAS card (which goes to one of my 4-drive hot swap bays), > > and the 1TB WDs and STs are plugged into the other (which goes into > > another 4-drive bay). You'd expect them to be detected in that order, > > but...nope. > > But (and my apologies if you clarify this; I'm a bit pressed for time), I'm > betting that the devices within each _set_ of ports, the motherboard SATA > set, the PCI-E SAS set, and the set of any block devices on USB, each are > assigned devices contiguously. So, see above. you're responding to an example of devices within each set **NOT** being assigned continguously. one of the 4TB drives on the same SFF-8087 port as the others was detected as /dev/sdc, while the other 4TB drives were detected as /dev/sdf to /dev/sdh. here's how the kernel sees them when booting, sequentially numbered drives on the zeroth SCSI-like controller (an LSI SAS card. the motherboard SATA ports are scsi:1:x:x:x:) Sep 05 23:13:34 ganesh kernel: scsi 0:0:0:0: Direct-Access ATA ST31000528AS CC49 PQ: 0 ANSI: 6 Sep 05 23:13:34 ganesh kernel: scsi 0:0:1:0: Direct-Access ATA WDC WD10EACS-00Z 1B01 PQ: 0 ANSI: 6 Sep 05 23:13:34 ganesh kernel: scsi 0:0:2:0: Direct-Access ATA WDC WD10EACS-00Z 1B01 PQ: 0 ANSI: 6 Sep 05 23:13:34 ganesh kernel: scsi 0:0:3:0: Direct-Access ATA WDC WD10EARS-00Y 0A80 PQ: 0 ANSI: 6 Sep 05 23:13:34 ganesh kernel: scsi 0:0:4:0: Direct-Access ATA ST4000DX001-1CE1 CC44 PQ: 0 ANSI: 6 Sep 05 23:13:34 ganesh kernel: scsi 0:0:5:0: Direct-Access ATA ST4000DX001-1CE1 CC44 PQ: 0 ANSI: 6 Sep 05 23:13:34 ganesh kernel: scsi 0:0:6:0: Direct-Access ATA ST4000DX001-1CE1 CC44 PQ: 0 ANSI: 6 Sep 05 23:13:34 ganesh kernel: scsi 0:0:7:0: Direct-Access ATA ST4000DX001-1CE1 CC44 PQ: 0 ANSI: 6 immediately after that, it assigns them the following device names: Sep 05 23:13:34 ganesh kernel: sd 0:0:7:0: [sdc] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB) Sep 05 23:13:34 ganesh kernel: sd 0:0:4:0: [sdf] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB) Sep 05 23:13:34 ganesh kernel: sd 0:0:5:0: [sdg] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB) Sep 05 23:13:34 ganesh kernel: sd 0:0:1:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB) Sep 05 23:13:34 ganesh kernel: sd 0:0:2:0: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB) Sep 05 23:13:34 ganesh kernel: sd 0:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB) Sep 05 23:13:34 ganesh kernel: sd 0:0:3:0: [sde] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB) Sep 05 23:13:34 ganesh kernel: sd 0:0:6:0: [sdh] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB) i can only guess that drive spinup timing is why 0:0:0:7 was allocated sdc rather than 0:0:0:2. or maybe it's the time of day, or phase of moon. i really don't know. but, like I said, it's not causing any problem so it doesn't matter. BTW, the previous boot (trying out 4.12) assigned dev names to the drives in the "natural" order, according to scsi device id: Sep 05 09:16:25 ganesh kernel: sd 0:0:4:0: [sde] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB) Sep 05 09:16:25 ganesh kernel: sd 0:0:5:0: [sdf] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB) Sep 05 09:16:25 ganesh kernel: sd 0:0:6:0: [sdg] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB) Sep 05 09:16:25 ganesh kernel: sd 0:0:7:0: [sdh] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB) Sep 05 09:16:25 ganesh kernel: sd 0:0:2:0: [sdc] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB) Sep 05 09:16:25 ganesh kernel: sd 0:0:1:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB) Sep 05 09:16:25 ganesh kernel: sd 0:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB) Sep 05 09:16:25 ganesh kernel: sd 0:0:3:0: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB) so, there's a possible modern day example of a kernel version change causing the kernel's device name for a drive to change - when nothing else changed, the drive was plugged into the same port on the same controller in the same PCI-e slot. > Well, there you go talking about adding/removing drives, again -- and of > course that changes device nodes. adding and removing drives is a very normal thing to do. completely unremarkable and not at all unusual. > That's what updating /etc/fstab is for. No, that's what using UUID= or LABEL= in /etc/fstab is for. so that doing completely normal things like adding or removing drives (or having a drive die unexpectedly) doesn't risk your files by mounting the wrong block device on the wrong mount point. Neither of these events should require /etc/fstab to be edited. You can do that if you want, but you're introducing a dependency on human interaction into your boot sequence (and an opportunity for human error). > > So, when the kernel devs say that you can't rely on the order of naming for > > drives and other devices, I believe them. > > What kernel devs? People who work on the kernel. google has been mostly unhelpful in locating a specific quote from a well-known kernel dev, but device detection and naming has been an issue in linux for as long as i can remember. It, along with the fact that the /dev/MAKEDEV script basically sucked, is why there have been several attempts in-kernel to solve it, including devfs, and later devtmpfs. BTW, udev was first written ~2004 by kernel developer (and maintainer of the kernel's driver core code) Greg Kroah-Hartman to replace/augment the features of devfs - and while he's a systemd fan-boy these days, this was LONG before systemd was even thought of. http://www.linuxjournal.com/article/7316 [...] Starting with the 2.5 kernel, all physical and virtual devices in a system are visible to user space in a hierarchal fashion through sysfs. /sbin/hotplug provides a notification to user space when any device is added or removed from the system. Using these two features, a user-space implementation of a dynamic /dev now is possible that can provide a flexible device naming policy. This article discusses udev, a program that replaces the functionality of devfs. It provides /dev entries for devices in the system at any moment in time. It also provides features previously unavailable through devfs alone, such as persistent naming for devices when they move around the device tree, a flexible device naming scheme, notification of external systems of device changes and moving all naming policy out of the kernel. also, from fstab(5) with ** emphasis added by me: LABEL=<label> or UUID=<uuid> may be given instead of a device name. This is the recommended method, as device names are often a coincidence of hardware detection order, and can change when other disks are added or removed. For example, `LABEL=Boot' or `UUID=3e6be9de-8139-11d1-9106-a43f08d823a6'. (Use a filesystem-specific tool like e2label(8), xfs_admin(8), or fatlabel(8) to set LABELs on filesystems). It's also possible to use PARTUUID= and PARTLABEL=. These partitions identifiers are supported for example for GUID Partition Table (GPT). See mount(8), blkid(8) or lsblk(8) for more details about device identifiers. > Surely you aren't talking about the Freedesktop.org weenies. of course not. > I of course agree that device nodes for drives and other devices can > change, but the question was: Under which cirumstances? What you've > just described is pretty much exactly the situation I detailed upthread. except for module load order, drive spin-up time, BIOS changes, and some other factors. the circumstances you mentioned are NOT an exhaustive list (and neither are the ones I mentioned). They're just **some** of the things that can affect device naming, not all of them. BTW, the actual point was NOT "under which circumstances" it can happen, so don't try to shift the goal-posts. The point I made was that it is crazy to rely on devices being assigned any particular in any particular order by the kernel because the kernel does not and can not guarantee that. > > and, yeah, this is an unusual setup for a home system. It's not all > > that unusual for anyone running a file server for a business or other > > organisation, or anyone who doesn't want to pay for a ridiculously > > overpriced NAS box when they can DIY with linux's built-in features. > > I would never recommend for a business as file server with simultaneous use > of motherboard SATA ports, a PCI-E SAS card, and USB things on an ongoing > basis. That seems like poor component selection, IMVAO. [0] Well, that's a pointless distraction to make. As mentioned, it's a home server & workstation. Built gradually, and cheaply (a few hundred dollars at a time rather than a few thousand or more for a complete new build). From new and second-hand parts. Upgraded many, many times over the last few decades (trace it back far enough and it is still, in the "my grandfather's axe" sense, the very first linux machine I ever built back in 1990 or 91). It uses, for example, both SAS and SATA ports because that's what I have and I didn't need to spend any money to get another controller card. It has SATA drives on the SAS ports because they're a lot cheaper than SAS drives (or anything else labelled "Enterprise") Boot drive(s) on SATA, bulk-storage on SCSI or SAS controller isn't even unusual for small-medium businesses. I would have no hesitation recommending this to any smallish business who wanted as much bang for their buck as possible - a linux-based NAS (or freebsd) is far better (in every way) than any consumer NAS box. (in fact, i know of people who build file and other servers with the boot disk on a USB stick because you can plug 2 of them into a motherboard's USB jumper block with a trivial adaptor, and leave them in the case without using up a valuable SATA or SAS port, or a drive bay) It is what it is so that I can get some/most of the benefits of the high-end gear I use at various $workplaces but cheap DIY(*) rather than spending thousands or tens of thousands at once. And also so that I can experiment with stuff that I may end up using at a $workplace. (*) not necessarily worse than expensive name-brand hardware. In fact, a lot of the DIY stuff is much better than any commercially available product. > > > And ifrename is cool. > > > > It was cool. I installed it on every machine for several years. Then it > > became unnecessary when the same capability (renaming NIC interfaces > > according MAC address) was standard in udev. > > Or, to put it a different way, udev becomes unnecessary the moment you > remember ifrename. except that udev will **always* run before any NIC is up, while ifrename may not - and will bail if the NIC is in use. which, IIRC, provided the motivation for me to finally switch from ifrename to udev years after udev had the capability. there's also the fact that udev (or a work-in-progress clone like mdev) is installed by default on almost every unix system these days. ifrename is not, and may not even be packaged for some distros. > > Given a choice between using a standard feature that's in every linux > > system (well, possibly excluding some embedded linux devices) and using > > a relatively obscure, "non-standard" tool that does the same thing, the > > decision to switch to using udev for that was easy. > > MS-Windows is 'standard', too. ;-> it's non-standard for linux systems, so can be ignored as irrelevant. > Personally, I rather like being in charge of my own software. In fact, I > rather insist, thanks-very-much-I'm-sure. another non-sequitir. I can configure udev to do what I want, it's not even particularly hard to do so (i've seen many config file formats that are much worse) - how is using udev not "being in charge of my own software"? or my own hardware or systems, which is what I guess you meant. > > udev also has the advantage over ifrename of being useful for a lot more > > device-related stuff than just NIC device renaming. > > It's a floor wax _and_ a dessert-topping! ;-> [1] udev is a single tool which can be used for a variety of device configuration tasks. ifrename can only rename NICs. > > devtmpfs doesn't solve the device order or naming problem. > > No, but it goes a long way towards eliminating the alleged necessity of > udev, which I am pretty sure is what motivated Torvalds and co. to introduce > it. IIRC, it was after that notorious incident when Sievers attempted > to strongarm kdbus into the kernel so that the systemd/dbus people could > overwhelm the kernel with messge traffic. Your recollection is faulty. "The return of devfs" https://lwn.net/Articles/331818/ devtmpfs was first proposed in 2009. The Sievers kernel debug= cmdline arg fiasco was in April 2014, and kdbus was announced at linux.conf.au in Jan 2014 (https://lwn.net/Articles/580194/). What's really funny, though, is that the first devtmpfs patch was announced in 2009 and written **BY** Greg Kroah-Hartman, Jan Blunck, and Kay Sievers. That's definitely NOT a response to an incident five years later by one of the authors. (BTW, it was GKH who pushed so hard to get kdbus into the kernel. GKH is still, rightfully, a highly-respected person in the kernel community. Sievers definitely isn't, and would have had no chance of pushing for anything. I paid a lot of attention to it at the time because it was, and remains, an issue of concern to me) > devtmpfs, among other things, was a statement that 'Actually, it turns > out we don't need your code to recognise hardware and autoload firmware > BLOBs, so pray don't motivate us to make even more of what you do > irrelevant.' Nope. devtmpfs was not one of the authors telling himself "we don't need my code". (Also, devtmpfs doesn't autload firmware blobs, that's done by the kernel core and the driver - typically the driver asks the kernel for the firmware, the kernel asks udev for it, udev finds it and hands it over, and the kernel passes it on to the driver...which then uploads the firmware to the device) Linux Torvalds yelled at Sievers (and rightly so) over the debug= incident and subsequent arrogant arseholery, and told GKH that he'd be rejecting any future code from Kay Sievers until his code and his attitude stopped sucking so much. https://patchwork.kernel.org/patch/3930121/ BTW, now that I know Sievers was involved in devtmpfs, I've put it on my list of things to be vaguely suspicious of. > > > https://wiki.gentoo.org/wiki/Mdev > > > https://github.com/slashbeast/mdev-like-a-boss > > > > whether it's called udev or mdev or some other clone of udev, it still does > > the same thing. > > No, you are mistaken. In _no way_ is mdev a clone of udev. Not hardly. It is intended to be a replacement for udev, it was written (at least partially) in response to the fact that udev was merged into systemd and was no longer maintained as a separate program. I call that a clone. if you like, call it a clone of udev's important features, as it was before the deliberately engineered and unncessary interdependence with systemd and gnome. > I started losing interest in udev at a rapid pitch the day I found that > the system no longer permitted me to use mknod to create a needed device > node in /dev. This is not tolerable, sorry: Software that tries to > tell the sysadmin he may not take necessary steps to administer his > system gets scrapped at the next convenient opportunity. when has it ever not been possible to run mknod in /dev? re: CPUs and "management engines": > Possible help may soon emerge from a new-ish initiative with Raptor > Computing's Talos II series using IBM POWER9 CPUs, where there is, > refreshingly, none of that shit anywhere in the SoC or surrounding > circuitry. More speculatively, the J-Core project has been reviving the > Hitachi SuperH CPU architecture now that all of the patents have expired, > with hardware designs that are open all the way from top to bottom. Much > depends on completion of their roadmap, particularly the 64-bit version of > SH-4 & support circuitry. We shall see. cool. i'll put it on my TO INVESTIGATE list for 2027 or 2037 (the 64-bit version will be timed just right for avoiding the pending 32-bit unix time_t apocalypse - aka "the Y2K bug for unix geeks") Seriously, though, stuff like that is interesting and good to hear about but I'm still waiting to see even ARM CPUs being useful for anything except embedded devices like overproved NASes and wifi routers (and MIPS is still dominant with openwrt-capable routers(*)), raspberry pi-like devices, and toys like phones and tablets. the long-touted rise of ARM-based server hardware has failed to materialise, year after year. in short, i'll believe it when i see it. (*) I'll be researching these again over the next few months. My suburb is scheduled for NBN to (finally!) be available next March (using FTTC rather than FTTP for bullshit Australian Libs vs Labor politics reasons and Emperor Murdoch's control of the Libs, so I'll need a VDSL modem to replace my ADSL2 modem). I don't trust vendor supplied firmware, and from what i've read most VDSL modems will reset to ADSL-mode if you set them up as a bridge (for pppoe on the local linux box), so I'll move my gateway/firewall over to an openwrt box. I may as well move dns, dhcpd, and a few other things too. craig -- craig sanders <[email protected]> _______________________________________________ luv-main mailing list [email protected] https://lists.luv.asn.au/cgi-bin/mailman/listinfo/luv-main
