Re: [PATCH] fs: Add 'rootfsflags' to set rootfs mount options

Rob Landley Fri, 08 Aug 2025 11:02:55 -0700

On 8/7/25 21:47, Dave Young wrote:

Another question, may need fs people to clarify.  If the mount is
tmpfs and it is also rootfs,  could it use 100% of the memory by
default,

If you want to softlock the system when rootfs fills up with log filesor something, sure.

That was one of the original motivating reasons for using tmpfs insteadof ramfs for a persistent initramfs you don't pivot off of. Plus thingslike ramfs always reporting zero free space (it doesn't rack) so youcan't use things like "rpm install" to add more packages at runtime, andso on... I had a list of reasons I added initmpfs support back in2013.... looks likehttps://lkml.iu.edu/hypermail/linux/kernel/1306.3/04204.html

(Ok, the REAL reason I did it is A) I'd documented that was how itworked when I wrote ramfs-rootfs-initramfs back in 2005 because itseemed deeply silly NOT to support that, and when nobody had made theobvious fix 7 years later I got guilted into it by an employer who I'dexplained initramfs to and they asked "how do we do the tmpfs version"so I whipped up a quick patch and they went "you need to upstream thisbefore we'll use it" so I went through The Process...)

Note that right now initmpfs isn't _specifying_ 50%, it's inheriting thedefault value from tmpfs when no arguments are specified. If you'respecial casing 100% for rootfs you'd still be passing in an argument tothe mount call to override the 50% default, just as a hardwired stringinstead of a user-provided one (and again it would be a terrible idea).

And if you DO change tmpfs itself to globally default to 100% then 'yes> /dev/shm/blah.txt' could lock your system as a normal user if theydon't change their mount script to specify an explicit constraint. Whichseems a bit of a regression for existing systems.

This new patch is because sometimes people making embedded systems wantto devote more than 50% of memory to rootfs while still having the otherbenefits of tmpfs. One of those benefits is not soft-locking the kernelif something writes too much data to the filesystem.


History time! (It's a hobby of mine. Plus I was here for this part.)

Tmpfs was originally called "swapfs" (because ramfs couldn't use swap asbacking store):


https://lkml.iu.edu/0102.0/0203.html

It was submitted to linux-kernel in 2001 (Peter Anvin was "?!" aghast):

https://lkml.iu.edu/0102.0/0239.html

Tmpfs got added in 2.4.3.3 alahttps://github.com/mpe/linux-fullhistory/commit/ca56c8ee6fa0


And almost immediately people noticed the softlock issue hadn't been fixed:

https://lkml.iu.edu/0103.3/0053.html

So the 50% default limit for tmpfs was introduced in 2001 (release2.4.7.5) with the description "saner tmpfs mount-time limits", ala:


https://github.com/mpe/linux-fullhistory/commit/80fa70c0ea28

Jeff Garzik wired it up as an alternative to initrd in November 2002:

https://lwn.net/Articles/14448/

Alas, the result was completely undocumented. I thought it sounded likea cool idea (it resizes automatically!) and reverse engineered how touse it (ok, mostly a lot of pestering people with questions in email)and wrote documentation encouraging people to use it in 2005:


https://lwn.net/Articles/157676/

When I converted rootfs to be able to use tmpfs in 2013 (link above)there was a rootflags= but not a rootfsflags= (ramfs was intentionally asimple demonstration of libfs that took no arguments) and I didn't addone because I didn't personally need it: the 50% default was fine for meand you can mount -o remount to change flags after the fact. (Although Idunno if you can change this limit after the fact or what would happenif you reduced it below what the filesystem currently contained,probably doesn't work.)

Although looking back at my blog entries from the time, it seems Imostly didn't want to deal with bikeshedding about the namehttps://landley.net/notes-2013.html#29-04-2013

A year later somebody asked me why rootflags= wasn't working forinitmpfs (http://www.lightofdawn.org/blog/?viewDetailed=00128) and Ibasically went "fixing it's easy, getting a patch into linux-kernelrequires far too much proctology for anyone on the inside to even seeit", and here we are 10 years later with the issue still unaddressed.(Open source! Fixes everything right up immediately. So responsive. Noproblems left to tackle, hard to find stuff worth doing...)

and then no need for an extra param?    I feel that there is
no point to reserve memory if it is a fully memory based file system.

You're confusing ramdisk with ramfs (initrd vs initramfs). The 50% isn'ta reservation, it's a constraint. Both ramfs and tmpfs are dynamic rambacked filesystems.

I wrote documentation about the four types of filesystem(block/pipe/ram/function backed) 20 years ago back on livejournal, Istill have a copy somewhere...


https://landley.net/toybox/doc/mount.html

Linus invented ramfs by basically just mounting the page cache as afilesystem with no backing store, so when memory pressure does flushrequests it goes "nope". When you write files it allocates memory, whenyou truncate/delete files it frees memory. That's why ramfs was just acouple hundred lines (at the time he was factoring out libfs so /proccould stop being only synthetic filesystem everybody dumped everycontrol knob into, and I recall he mostly did ramfs as an example of howminimal you could get with the new plumbing). Then tmpfs added somebasic guardrails and the ability to use swap space as backing store incase of memory pressure (if you have swap, which a lot of embeddedsystems don't; note that mmap()ed files have backing store, andexecutables are basically mmap(MAP_PRIVATE) with some bells andwhistles, so you can still swap thrash under memory pressure evenwithout swap by evicting and faulting back in executable pages).

The old ramdisk mechanism from the 1990s created a virtual block device(/dev/ram0 and friends I think) which you would then format and mountusing a block backed filesystem driver like ext2. This was terrible fora bunch of reasons, unnecessarily copying all the data to use it andregularly having two copies of the data in RAM (the one faulted into thepage cache and the one in the ram block device backing store). Heck,when you had a system running from initramfs, you could configure outthe whole block layer and all the block backed filesystem drivers, whichmade the kernel way smaller both in flash and at runtime. Even beforeinitramfs, ramdisks largely receded into the mists of history (exceptfor initrd) when loopback mounting became a thing, because you can justdd if=/dev/zero of=blah.img bs=1m count=16 and then format that andloopback mount it, and you control the size naturally (no rebootingneeded to change it) and it's got its own built-in backing storeallowing memory usage of the virtual image to be dynamic (ok, you canmlock() it if you really want to but you could _also_ loopback a fileout of ramfs or tmpfs to accomplish that)...

The point of the 50% constraint in tmpfs is to tell the system "when Iask how much free space there is, here's what the maximum should be".Since ramfs doesn't enforce any such constraint, it always reports bothtotal and free space as 0, which tools like "df" use to indicate"synthetic filesystem" and thus not show by default when you ask about"disk free" space. Ramfs will let you keep writing as long as thekernel's internal malloc() doesn't fail to find the next page, and THATis a problem because writes will fill up every last scrap of kernelmemory and then the rest of the kernel loses its lunch when itsallocations fail. (They added the OOM killer to try to cope with thefact that recognizing you've run out of memory comes not when you mmap()a range but when you asynchronously fault in pages by reading ordirtying them, which is at memory access time not a syscall with areturn value. That's a WHOLE SAGA! There really _isn't_ a good answerbut people will happily argue about least bad FOREVER. The youngergeneration seems to believe that Rust will do something other than addabstraction layers and transition boundaries to make this worse.)

Anyway, the perennial complaint about the 50% initmpfs limit was that ifyou have a small system that needs 16 gigs of ram to run, but you have acpio.gz that expands to 48 megabytes, then 64 megs SHOULD be enough forthe system... but it won't let you. You have to give it 96 megabytes ofram in order to be able to use 48 megs of root filesystem, or elseextracting the cpio.gz will fail with an out of space error beforelaunching init. (This was especially common since you're about to freethe cpio.gz after extracting it, so by the time it launches PID 1 thekernel has MORE memory available. There's a high water mark of memoryusage while he system is basically idle, but once you're past that extramemory is just adding expense, draining your battery, producing heat...)

The embedded developers have been familiar with the problem for decades,and (as usual) have repeatedly fixed it locally ignoring linux-kernelpolitics. I first got asked to fix it over 10 years ago, I just find thekernel community unpleasant to interact with these days so mostly onlywander in when cc'd.

The author of this patch asked me off list if I had a current version ofthe patch I'd given other people, which I hadn't updated in _years_.It's been fixed a bunch of times, https://lkml.org/lkml/2021/6/29/783was the most recent we could find, but the fixes stay out of treebecause Linux dev is aggressively insular ever since the linuxfoundation drove away the last of the hobbyists back aroundhttps://lwn.net/Articles/563578/ and became corporate "certificate ofauthenticity" signed-off-by-in-triplicate land with a thousand linepatch submission procedure documenthttps://kernel.org/doc/Documentation/process/submitting-patches.rst anda 27 step checklisthttps://kernel.org/doc/Documentation/process/submit-checklist.rst


(Which will usually still get ignored even when you do that.)

Rob

Re: [PATCH] fs: Add 'rootfsflags' to set rootfs mount options

Reply via email to