On Friday 20 July 2007 3:00:01 am Greg KH wrote: > On Fri, Jul 20, 2007 at 01:14:27AM -0400, Rob Landley wrote: > > Is there anything in /sys/class/block that _isn't_ in /sys/block? > > No. > > > Does "if you want to use it, but /sys/block will still be there" NOT > > mean, as I assumed at the time, that I could safely ignore it? > > Ignore what? /sys/block? If you see /sys/class/block, then yes, you > can ignore it as they are just symlinks back to each other.
I read "if you want to use it" as meaning "if you want to use "/sys/class/block", I.E. it was optional. If /sys/block is going to remain a symlink to /sys/class/block then using the path "/sys/block" should work on both existing kernels and on new kernels without modification. What moving it will force me to do is edit out "/sys/class/block' if I find it looking for char devices under "/sys/class". Moving it forces me to add code to remove it. I don't know what the supposed benefit is... Ah, hang on. Most likely this is an implementation detail from moving everything under /sys/subsystem and making /sys/class a symlink -> subsystem and /sys/block a symlink -> subsystem/block. So it sounds like it's not really intentional breakage, just an implementation side effect. Ok, that makes sense. (Sorry, this hadn't occurred to me until now. I'm not the one implementing this stuff, so I haven't spent the past few months thinking through the ramifications already. This entire thread is one of about 12 things I'm working on at the moment.) > > (My impression from the meeting at OLS was that adding /sys/block to > > /sys/class/block had been just an idea rejected in favor of adding it > > to /sys/subsystem/block. > > No, the /sys/class/block is in -mm and has been for some time. A yes, "set back up an automated -mm testing thing that away when previous laptop did". I need to bump that up on my todo list... > > I note that neither my Ubuntu 7.04 laptop nor the 2.6.22 system I > > built has either a /sys/class/block or a /sys/subsystem/block, so > > anything written attempting to use that won't work on any currently > > deployed Linux system. You can't use it today, and will never be able > > to use it on any kernel version deployed today.) > > Not true at all, it works just fine here on my machines, and on all > distros released in the past year or so as tested out by a lot of users. "All the distros" meaning "not Ubuntu 7.04"? I just checked again, it doesn't have either a /sys/class/block or /sys/subsystem/block. I realize they're coming... > The only reason it isn't in Linus's tree just yet, is that some very old > mkinitrd programs don't seem to like it (we are talking Fedora Core 3 > based distros, not Fedora itself.) People are trying to work out what > the proper fix is for userspace there when they get the time. > > So, expect the change to show up in 2.6.24. Ok. > > > > To all of this, I would like to humbly ask: > > > > > > > > PICK ONE! JUST #*%(&#%& PICK ONE! AAAAAAAAHHHHHHH!!!!!!!!! > > > > > > Man, you totally miss the point. > > > > I want to document a stable API, including the subset of sysfs that will > > remain stable. The "point" appears to be that there isn't one because > > sysfs is "special", and udev should be in the kernel source tarball. > > What? Since when does udev have to be in the kernel source tarball? > Who ever said that? I got that impression from: http://lkml.org/lkml/2006/7/30/228 http://lwn.net/Articles/193603/ > The issue here is that if you follow the rules as specified by the file, > Documentation/sysfs-rules.txt and Documentation/ABI/*/sysfs-* you should > be just fine. To ignore them, as you have done in your examples, will > cause problems. I read the Documentation/ABI ones in the current linus -git (well, as of... tuesday?) and am unaware of any conflicts. I read sysfs-rules.txt this morning, and the only _required_ change I'm aware of is filtering out /sys/class/block if it occurs. (Is there more Documentation in Andrew Morton's tree that's not in Linus's?) Lemme re-read my document... Ok, /sys/bus/*/devices/*/dev is the path Kay told me to use during our discussion at OLS. It's one of the paths I cut and pasted out of the email he sent me. He typed that path, I didn't. I see that the bit in "sysfs-rules.txt" about never using the "devices" symlink contradicts that. Ok, so what _should_ I do? Personally, I've never seen a dev link under "/sys/bus". I neither own nor have personally encountered hardware that does that, and the first I heard that there _was_ such hardware was when talking to him at OLS. Kay: if you get this, what path should I use? Your email said: > /sys/bus/*/devices/*/dev > /sys/class/*/*/dev > /sys/block/*/dev > /sys/block/*/*/dev > > /sys/subsystem/*/devices/*/dev The fifth of which isn't in currently deployed kernels (not in kernel.org, not in most recent ubuntu release, that counts as "not currently deployed" to me), and the first four should continue to work even when that goes in, so I didn't include it in "how to find this information", but I note that it also follows "devices"... > > I'm trying to write down here the minimal information needed to find the > > "dev" nodes to populate /dev. There's no functional reason I'm aware of > > for them to keep moving around. > > The issue is that the devices themselves keep moving around in the sysfs > tree all the time as systems are dynamic and change. > > Again, look at the udevtrigger program for a simple way to achieve this > /dev population that you so desire. I'm aware of that program, and can iterate through the tree and write "add" to "uevent" instead of reading "dev". Technically the /sbin/hotplug approach would have the same potential race condition with remove happening during scan: the potential downside is leaving a node in /dev that will give an -ENODEV if you try to open it, which is annoying but not fatal (no obvious security implications or anything), and a small enough race condition that it would seldom if ever inconvenience real users. I can document this, though... > But also realize that sysfs is much bigger than just trying to get the > information to create a /dev tree. I know. At the moment I'm only trying to document the subset of sysfs needed to maintain a /dev tree via hotplug. In future this may expand to include enough information to persistently name certain types of devices, but I'd be just as happy to have that under Devices/ABI and refer to it instead. > > > > > > Entries for char devices are found at the following locations: > > > > > > > > > > > > /sys/bus/*/devices/*/dev > > > > > > /sys/class/*/*/dev > > > > > > > > > > Uh, that is actually the generic location? > > > > > > > > It's what Kay Sievers and Greg KH told me at OLS when I tracked them > > > > down to ask. I've also experimentally verified it working on Ubuntu > > > > 7.04. That was cut and pasted from Kay's email, and it works today. > > > > > > That is still true, but it still does not tell you the type of node to > > > create, as you seem to insist on. > > > > I don't insist on it, mknod insists on it. You cannot mknod a dev node > > without specifying block or char. > > > > You're saying that sysfs should provide major and minor numbers without > > anywhere specifying "char" or "block", meaning the major and minor > > numbers cannot be _used_. I am insisting on getting the third piece of > > information without which "major" and "minor" are useless. > > > > I asked very specifically about this at OLS, several times. What you're > > telling me now seems to contradict what you told me then. > > Here's the rule: > If the SUBSYSTEM is "block", it's a block device. Otherwise > it's a char device. Ok. Cornelia Huck seemed to disagree, but I see that's been resolved in another message. > But also realize that the majority of events you will get have nothing > to do with device nodes. I think you are forgetting this fact. Actually, I'm filtering them out, but I should make a note of it in the documentation. (I'd happily document other events you might want to respond to that come into the hotplug mechanism, but I don't know what they are and am trying to start with the basics and flesh it out later. Persistent device naming is a can of worms I'll have to open eventually. Ubuntu 7.04 put uuids on every _partition_ in my laptop, and spins up my external usb hard drive trying to mount root. When connected a machine with an IDE hard drive, that's now going through the scsi layer. Sigh...) > > If block is going to move to sys/class, I can put in a warning about this > > pending breakage in the documentation, and modify my example code to > > filter it out. > > It's not a "breakage", we are preserving a symlink. The point is that > you should not rely on the fact that /sys/block will be there in the > future, as the documentation I pointed to above describes. It doesn't say that /sys/block are deprecated or will be removed. The closest it says is: > If /sys/subsystem exists, /sys/bus, /sys/class and /sys/block can be > ignored. Which isn't the same as "must be ignored" or "may be removed". I asked about this explicitly at OLS ("can I just keep using /sys/block and /sys/class") and was told that there were no plans to remove them. Existing systems require using the older names, and I'm unaware of any information the new names provide that the old ones don't. > > > > > It may be enough (and less confusing) to just state that the dev > > > > > attribute will belong to the associated "class" device sitting > > > > > under /sys/class/ (with the current exception of /sys/block/). > > > > > > > > Nope. If you recurse down under /sys/class following symlinks, you > > > > go into an endless loop bouncing off of /sys/devices and getting > > > > pointed back. If you don't follow symlinks, it works fine up until > > > > about 2.6.20 at which point things that were previously directories > > > > BECAME symlinks because the directories got moved, and it all broke. > > > > > > That's total nonsense. > > > > Which part, the "following symlinks produced an endless loop" or > > the "directories turned into symlinks so not following them broke?" > > > > Let's see... > > > > According to my blog, Frank Sorensen first sent me a C port of my /dev > > populating script on December 12, 2005. The current kernel at the time > > was 2.6.14, so grab that, build user Mode Linux... Huh, it won't build > > with gcc 4.1.2. Or 3.4. Ok, defconfig? Nope, that wants a stack check > > symbol? Let's see... Ah, google says add -fno-stack-protector to CFLAGS. > > Right... Fire it up under qemu, "mount -t sysfs /sys /sys", and: > > > > In 2.6.14, /sys/block/hda/device points > > to ../../devices/pci0000:00/0000:00:01.1/ide0/0.0 > > > > /sys/block/hda/device/block points to ../../../../../block/hda > > > > So in 2.6.14 you could > > go /sys/block/hda/device/block/device/block/device/block... endlessly, > > which is the reason I wrote mdev not to follow symlinks but to instead > > only look at actual subdirectories. > > That was the problem right there. Why would you ever want to traverse > symlinks blindly without realizing what you were walking? Because I didn't want to encode an unknown structure of sysfs into the program? Partitions were at the same level as hard drives one release (when I first came up with a working probing script for my Firmware Linux project, which according to my blog was October 27, 2005 and was using something like Linux 2.6.10), and moved into subdirectories the next, and I had no way of knowing if or when a third layer was going to be added to some future device I didn't know about. Keep in mind I've been following this, on and off, for a while now: http://lkml.org/lkml/2003/12/9/1 http://lkml.org/lkml/2003/12/10/16 And what I did was in response to the endless loop was _stop_ traversing symlinks at all, and only followed subdirectories. Which worked fine until subdirectories got moved and replaced by symlinks, which is when I started asking "so what paths can I follow that will reliably be there in both current and future releases"? Which is what I'm trying to document now. > You can't > just run 'find' on sysfs and expect to not get caught in endless loops, > as the goal of the different parts of sysfs is to be able to start in > one place, and figure out all of the needed information from there. I'm trying to document what those paths are. People keep wanting to tell me about future plans that aren't merged yet. A year ago the future plans were directories becoming symlinks, now the plans are /sys/subsystem, I'm sure in a year there will be new future plans. I'd really like not to have to change existing code to still work with them, hence an attempt to document the API I _SHOULD_ use so that I don't have to. If I discard what's there now and document the current future plans that aren't merged yet, how do I know that they won't themselves be ripped out a year after that? > For example, if you have a device, you can get the subsystem it belongs > to, the driver bound to it, and other stuff. Can you get the default name of the device currently encoded as the last element of the path that "DEVPATH" points to (ala /class/mem/zero), but which I won't necessarily get if DEVPATH starts to point to /device/12345/:00 as some people keep saying DEVPATH should point to? It's not encoded as one of the hotplug variables, other than extractable from DEVPATH in a way that may or may not continue to work... > If you start with a > driver, you can get the devices it binds. Can you see the circle > already? > > So, you need to watch what you are trying to find, and if you do that, > you never will get caught in circles. We never had that problem in udev > at all, as we just work with what was passed to us, not blindly try to > walk the whole sysfs tree. You wrote both the sysfs code and the udev code. You wrote both sides of the export, changed both sides of the export fairly freely, and you know what you intended to do and what was merely an implementation artifact. > Please use the proper context in order to get the information you need. > And at all times, a directory can turn into a symlink in order to keep > the same information possible. I am aware of that, therefore I need to look at a known set of paths, which is what I'm trying to document now. > > (It uses the same code to traverse down > > beneath /sys/block and /sys/class to look for "dev" entries.) This works > > fine up through the 2.6.20 in ubuntu 7.04, where everything > > in /sys/class/tty/* is still a subdirectory. But in 2.6.22, > > /sys/class/tty/* is all symlinks. Hence the code that was working before > > changed, due to something that worked fine for a couple years but broke > > because it wasn't considered part of a stable API. > > > > Which part of this is "total nonsense"? > > Your code :) *shrug* There was a better way to do it back in 2005 without reading your mind? > > > until you have proven to have read > > > the udevtrigger code, > > > > I read the udev code when it was first posted. I read it again 20 > > versions later, and read it again 20 versions after that. I couldn't > > COMPILE the darn thing for its first ~40 releases, the code got ripped > > out and re-written several times, I watched as it grew and then threw out > > libsysfs. > > You could not build it? Why not? I don't clearly remember the details from two years ago, but after glancing at my blog http://landley.net/notes-2005.html#27-10-2005 I vaguely recall that it had large numbers of undocumented environmental dependencies and I got sick of playing whack-a-mole installing packages, plus it had no documentation whatsoever and required a complicated configuration file to the point it was actually _easier_ to write a shell script to parse sysfs directly. (Trying that got results in about 15 minutes. Staring at udev for half a day did not.) Plus I remember downloading different early versions of udev and finding things hugely rewritten between each update to the point that trying to pick it apart until it stabilized was a waste of time. I also remember thinking that libsysfs sounded like a horrible idea (having your own copy of a shared library in the source tree defeats the purpose of having a shared library). It was something that bothered me about the design from day one, libsysfs was in theory an external library but udev included its own copy, which made as much sense to me as including its own copy of glibc. Here's the problem back in 2003: http://www.ussg.iu.edu/hypermail/linux/kernel/0311.2/0716.html Here's you replying to me on that topic in 2005: http://www.ussg.iu.edu/hypermail/linux/kernel/0512.1/0617.html > Did you send me a patch for this > problem that was major enough to keep you from using the project? That problem was only one reason I didn't use the project. I objected to most of the design, at some length, in this post back in 2005: http://lkml.org/lkml/2005/10/30/189 > > So essentially you're saying "well read it again, we've finally got it > > right now"? > > Not at all, we are saying to look at how to achive what you are trying > to achieve by reading a very small and well documented .c file (530 > lines with comments) that explains how to easily and quickly achieve > what you are trying to duplicate. Ok. I note that most of the stuff I was objecting to in 2005 wouldn't _fit_ in 530 lines, and my first gripe in my blog post was lack of documentation and unnecessary complexity, neither of which appear to be the case now... > Heck, I did the same thing in a bash script for the Gentoo startup code > a while back that still works, but has ordering issues that the .c file > fixes up. Hence it was dropped for the replacement that we are pointing > you at. Ok. > > > and got a clue how to do stuff reliably, and get > > > the basic knowledge needed to document it. > > > > Because talking to you and having you email me the notes from this > > conversation did not provide the basic knowledge needed to document > > hotplug and firmware loading. Nor did asking for feedback on the > > document I wrote up. Thanks ever so much. > > > > I point out that udev changes from version to version, so that running an > > old version of udev against a new kernel has been known to break. > > Hence the Documenation/CHANGES file documents the version that is > needed. Right now it shows a version that is over a year and a half > old. I do know that you can get away with running versions that are > even older than that if you want to, but it's not really recommended. Udev appears to have changed, for the better. I'm still uncomfortable with "the implementation is the specification". > > Udev was more or less completely rewritten three times while I was > > still paying attention to it. Reading the udev code and seeing what > > it's doing struck me as about as likely to reveal a stable API as > > reading the kernel source, or experimenting with sysfs from userspace. > > (Both of which I've _done_ at various points, and it keeps changing.) > > The development cycle of udev has nothing to do with sysfs here. I'm trying to figure out how to decouple them, yes. :) > Other > than the fact that we learned how to interact with a kernel interface > that directly exposes the internals of the kernel itself, something that > no one had done before. In learning how to handle such major changes, > udev has changed in order to support zillions of devices, small memory > footprints, and lightening fast speed, all changes that required big > udev internal changes, but had _nothing_ to do with the kernel and/or > sysfs. Arriving at simple can take a lot of work. You don't have to tell me that. :) > > Are you saying that the current version of udev will work with all future > > kernels, and thus if I can figure out what udev is doing today, I can > > just document that as the stable API? > > If you want to figure out how to create a dynamic /dev filesystem that > can handle persistance device names, dynamic rules created by users, > zillions of devices on small and big systems, small footprint, and very > quick speed, then yes, read the udev source code. I did all that but the persistent device names in mdev, without ever referring to udev (after bad experiences with it in 2005), although I no longer maintain any part of busybox. > What is the goal of this document here? You start out trying to explain > the hotplug interface, and then get side tracked into talking about > creating a dynamic /dev/ filesystem in userspace and then ramble on into > how sysfs is layed out. These are three separate things Various people asked me for documentation on hotplug and firmware loading, and what I know how to do with hotplug (because I had to work out how in 2005, and I'd like to nail down the approved way of doing it) is create /dev nodes. > While the act of creating such a /dev filesystem does have something to > do with the hotplug/uevent interface of the kernel, it isn't reliant on > it. And the layout of sysfs also doesn't really have much affect on the > creation of such a /dev filesystem, as udev proves (it works just fine > without sysfs even being mounted.) Via netlink events? I vaguely recall a thread about deferring all the "add" events until after a netlink daemon was up, but I thought you needed sysfs for that. > If you want to just document the hotplug/uevent interface then do > that. > > If you want to document sysfs and it's structure, do that too, after > reading the existing documentation and understanding that. I've read the existing documentation that I've seen. Unfortunately I'm too jetlagged at the moment to finish collating it, and I need to go look at this cleaned-up no-longer-scary udev when I'm awake. > thanks, > > greg k-h Rob -- "One of my most productive days was throwing away 1000 lines of code." - Ken Thompson. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/