This is a proposal for a simple way of adding streams support to the Linux kernel, without significant changes in current Unix-like semantics. It was inspired by a recent discussion in the #kernelnewbies IRC channel, and by the recent talk about the subject in the Linux Kernel mailing lists. I did a first fast design, and took an hour polishing it, simplifying it, and making sure I covered most of the important cases. This should be read as a RFC (of which I've been reading maybe too much lately). I've included both the polished draft and the informal first draft, since the informal draft is a bit more explicit in some issues. I changed some things in the second version (found out I needed one less syscall), so treat the first version as a informative historical document. I plan on trying to learn the VFS code to try to implement the proposal if it is considered good enough to be worth an implementation. Note that I currently know nothing of the VFS code, so if nobody else wants to implement it it'll take some months until I'm good enough at it. -- Cesar Eduardo Barros [EMAIL PROTECTED] [EMAIL PROTECTED]
Streams design draft, version 0.02 This is a proposal for a simple way of adding streams support to the Linux kernel, without significant changes in current Unix-like semantics. It was inspired by a recent discussion in the #kernelnewbies IRC channel, and by the recent talk about the subject in the Linux Kernel mailing lists. I did a first fast design, and took an hour polishing it, simplifying it, and making sure I covered most of the important cases. This should be read as a RFC (of which I've been reading maybe too much lately). Motivation (informative) Much has been said recently about support for "streams". Streams are defined here as arbitrary-sized chunks of binary data attached to a file. The need for support for streams arise from the desire of better support for legacy filesystems which have them. Some also want to create new filesystems which explicitly support arbitrary streams. (long explanation of what they really are inserted here) A well-defined API is required to avoid every different kernel doing things in a different way, and to make clear all the allowed and disallowed situations. That's why I'm submitting this informal Linux Kernel Standard proposal. Word usage (normative) (insert standard RFC rant about must/should/may/should not/must not) Vocabulary (normative) stream - (insert definition here) streams directory - the invisible virtual directory associated to a file or directory. It can contain zero or more streams or streams subdirectories. streams subdirectory - a normal directory within a streams directory or streams subdirectory root file - The file associated with a streams directory (and thus with all its streams subdirectories) and with their streams. streams hierarchy - The set of all streams, streams directories, streams subdirectories, symlinks and other things which have the same root file Alternate representations (informative) Some have proposed using the current API and extending it to handle streams. The methods proposed ranged from the horrible (using :: like some unnamed operating system does) to the reasonable (trating files as directories in a open request causing the associate streams to be opened). The problem with all these proposals is that they confuse the namespaces. One case where it fails is when you use dumb programs that assume the operating system will return an error if you try to treat a file as a directory (which is not a dumb idea -- letting the operating system do the checking instead of having to do it every time by hand). Most of these alternative approaches are targeted at making a specific userspace program happy -- the most common targets being cp and tar. My view is that it should not be needed or, if needed, the C library can handle it, much like calls like opendir() are. Streams representation (normative) Every file has an associated "streams directory". A streams directory (or more shortly streamsdir) is like a normal directory, except that it's invisible in normal operation and can only be accessed using the streams API. Directories can also have an associated streams directory. Streams directories and subdirectories can only handle normal files and streams subdirectories. These can also have their own streams, and so on, recursively. Note that a streams directory MUST NOT have an associated streams directory. Note that it is allowed to have a streams directory associated with a streams subdirectory. Notice that being allowed to do something doesn't mean you have to; be conservative in what you do and liberal in what you accept. While userspace programs SHOULD NOT reactly incorrectly when finding deep streams recursion and streams in directories, it is recommended that filesystem makers avoid allowing attaching streams to directories or to other streams and creating subdirectories in streams directories. Programs MUST NOT depend on being able to use streams at all. Programs MUST NOT depend on being able to open a streams directory. Programs MUST NOT depend on being able to do any specific action to a stream or stream subdir. I not mentioning an invalid behavior in this paragraph doesn't mean it's not invalid. Note that the streams directory is considered a property of the inode; this means it is shared when a file is hardlinked. Hardlinks are allowed in streams directories and subdirectories, and they can point to inside or outside the streams hierarchy associated with a file. Symlinks are allowed (but can't point to outside a streams hierarchy). Again, an implementation is explicitly allowed to disallow both hardlinks and symlinks, or allow them only in special conditions. It is recommended to disallow symlinks and hardlinks within a stream hierarchy. A special case of the above is when a stream directory or subdirectory has a hardlink to their root file. This will often be found even when hardlinks to outside a streams hierarchy is forbidden. Since it's a hardlink, the streams directory associated with it is the same as you would see looking from outside. The streams hierarchy is treated like a separate VFS; this means that the streams directory is treated like a root directory. "foo", "/foo", "//foo", "../foo" and "/../foo" all point to the same stream. The streams directory is dynamic; it does not have to be created before use, and is deleted if empty. This happens only if the filesystem has streams support. Kernel API (normative) These kernel to userspace functions are also part of the userspace API. A kernel is allowed to implement them differently. They are here mostly as a proposal of how I suggest it be done in the Linux kernel. int stream_open (int fd, const char * stream, int flags, int mode); This is the main call to the streams API. It has the same parameters as a normal Linux open(2) syscall, but takes an extra first parameter which is a valid filesystem fd (which means no devices, sockets, fifos, or something like that). filename refers to inside the namespace defined by the streams hierarchy. An example of possible use would be: fd1 = open ("foo", flags1, mode1); fd2 = stream_open (fd1, "/bar", flags2, mode2); You can also open "/" as a directory if you want to get a list of the streams. Note that it is allowed to close fd1 and still use fd2. The file pointed to by fd1 can even be deleted without affecting fd2 (except if the bar stream is a hardlink to foo, which means bar would have its link count decremented) (insert list of possible error codes here) int stream_mkdir (int fd, const char * pathname, mode_t mode); int stream_rmdir (int fd, const char * pathname); Both MAY be implemented if support for streams subdirectories is not implemented. They MUST be implemented if support for streams subdiretories is possible. They act like mkdir(2) and rmdir(2), and take the fd of the root file. (insert list of possible error codes here) Rationale (informative) This is a simple and powerful API for streams; it probably allows most current streams filesystems to be fully used. Most of the complexity is passed to the userspace level; most other streams proposals can be emulated with this design via a simple translation layer. Only one new syscall is needed, unless you want stream subdirectories. If you want them, two extra syscalls are needed, due to the design of mkdir(2) and rmdir(2). If you want to make streams visible to normal Unix programs, a simple LD_PRELOAD or a changed C library can provide that easily. No changes to the stat functions are specified; it is possible but not very useful to add a flag saying the file has streams. Security Considerations (informative) Streams are a great way of hiding things for illicit purposes and a bad way of hiding them for licit purposes. The implementator must not forget to include the streams hierarchy in disk quota calculations. Care should be taken to prevent endless recursion. This design is NOT suggested as a way to design new filesystems. Filesystems designers SHOULD avoid creating filesystems that use streams. This API SHOULD be used only to support legacy filesystems. Streams subdirectories are evil and SHOULD NOT be implemented. Acknowledgements (informative) Thanks to surf (Daniel Phillips) for bringing back the subject of streams and for forcing me to create an argument against the foo/bar design.
Streams design draft, draft 0.01 In a recent discussion in IRC, the subject of streams was brought up again, and it was mentioned that the problem was that everybody talked and nobody did anything at all. So, I decided to propose a simple API for streams, which pushes the details like tar/cp/cpio compatibility to userspace. I believe this is the simpler and easier way to do it. This is a draft for a draft for a unofficial standard for Unix-like streams (I think that we need a rfc-like standard for such a touchy subject, and also I have been reading too many RFCs lately). So, I believe there should be discussion on this (like one would do for a real rfc) -- I'm sure I did something bogus somewhere. Kernel design Inside the kernel, the streams would be implemented as a hidden magic directory pointed to by a hidden field in the inode. If the filesystem supports streams, the virtual streams directory is always there, even if it doesn't exist (that is, a non-existing streams directory is represented by a hidden "virtual" dir which doesn't get written to the disk, and removing the last stream removes the directory from the disk). Of course the filesystem isn't required to represent the streams in the disk that way. Notes: 1. There is no $DATA stream. The "data" stream is the own main file. If you think otherwise, use a userspace wrapper. 2. There can be a $DATA stream, represented as a hardlink to the main file. If you try to open its streams directory, you get the same one you are in (to prevent infinite recursion, much like /'s ".." entry). This is filesystem-specific; I'd recommend NOT doing so unless the legacy design already has it. 3. Streams can have their own stream directories, if the filesystem designer was nuts enough to allow that. 4. Streams directories can have normal directories inside them. No, I have no idea of why that would be useful. 5. Streams directories can't have their own stream directories -- I know they should to keep things logic, but I fear the kernel would explode if that was allowed (programs should *NOT* depend on this one!) 6. No special files allowed. No sockets, fifos, devices, or other strangenesses. Of course no program should depend on that either. RATIONALES: 1. Keep it simple 2. Allow all weird cases you can without special casing, or else some random will make a filesystem that needs that. 3. Avoid namespace confusions. The foo/bar design is horrible. Imagine you had a foo file and you untar a tarball which has a "foo/bar" entry -- the results would *not* be what you expected. The kernel API is a couple of syscalls: (RATIONALE: I don't want to use a single syscall to do everything under the sun using the first parameter as a sub-syscall number) int openstreamdir (int fd); - The fd argument is a fd which would be valid for calls like fstat. (RATIONALE: It makes sense to allow one to attach streams to a directory, even if no sane person would do that -- because someone in the future might. I believe that having the same restrictions as glibc's fstat is sane) - The returned fd is almost like a fd you get from glibc's opendir (RATIONALE: avoiding syscall explosion) - Doing something insane like trying to attach a streamdir to a directory (even the API not forbidding it) deserves a -EISDIR (RATIONALE: allow future expansion but disallow crazyness until you need it. Programs shouldn't depend on it) int openstream (int fd, const char * filename, int flags, int mode); - This one is like sys_open, with an extra fd argument (which is the same one you would pass to openstreamdir). - The filename is parsed as if the streamdir was the root of the VFS. That is, foo, /foo and //foo are the same stream. - flags and mode are normal (as normal as flags and mode can be) Usermode design (to be done later -- userspace isn't my area, but it should be easy) Thanks to surf (Daniel Phillips) for bringing back the subject of streams and for forcing me to create an argument against the foo/bar design.