[gentoo-dev] Re: rfc: Does OpenRC really need mount-ro

Duncan Tue, 16 Feb 2016 18:20:35 -0800

William Hubbs posted on Tue, 16 Feb 2016 12:41:29 -0600 as excerpted:

> What I'm trying to figure out is, what to do about re-mounting file
> systems read-only.
> 
> How does systemd do this? I didn't find an equivalent of the mount-ro
> service there.


For quite some time now, systemd has actually had a mechanism whereby the 
main systemd process reexecs (with a pivot-root) the initr* systemd and 
returns control to it during the shutdown process, thereby allowing a 
more controlled shutdown than traditional init systems because the final 
stages are actually running from the virtual-filesystem of the initr*, 
such that after everything running on the main root is shutdown, the main 
root itself can actually be unmounted, not just mounted read-only, 
because there is literally nothing running on it any longer.

There's still a fallback to read-only mounting if an initr* isn't used or 
if reinvoking the initr* version fails for some reason, but with an 
initr*, when everything's working properly, while there are still some 
bits of userspace running, they're no longer actually running off of the 
main root, so main root can actually be unmounted much like any other 
filesystem.

The process is explained a bit better in the copious blogposted systemd 
documentation.  Let's see if I can find a link...

OK, this isn't where I originally read about it, which IIRC was aimed 
more at admins, while this is aimed at initr* devs, but that's probably a 
good thing as it includes more specific detail...

https://www.freedesktop.org/wiki/Software/systemd/InitrdInterface/

And here's some more, this time in the storage daemon controlled root and 
initr* context...

https://www.freedesktop.org/wiki/Software/systemd/RootStorageDaemons/


But... all that doesn't answer the original question directly, does it?  
Where there's no return to initr*, how /does/ systemd handle read-only 
mounting?

First, the nice ascii-diagram flow charts in the bootup (7) manpage may 
be useful, in particular here, the shutdown diagram (tho IDK if you can 
find such things useful or not??).

https://www.freedesktop.org/software/systemd/man/bootup.html

Here's the shutdown diagram described in words:

Initial shutdown is via two targets (as opposed to specific services), 
shutdown.target, which conflicts with all (normal) system services 
thereby shutting them down, and umount.target, which conflicts with file 
mounts, swaps, cryptsetup device, etc.  Here, we're obviously interested 
in umount.target.  Then after those two targets are reached, various low 
level services are run or stopped, in ordered to reach final.target.  
After final.target, the appropriate systemd-(reboot|poweroff|halt|kexec) 
service is run, to hit the ultimate (reboot|poweroff|halt|kexec).target, 
which of course is never actually evaluated, since the service actually 
does the intended action.

The primary takeaway is that you might not be finding a specific systemd 
remount-ro service, because it might be a target, defined in terms of 
conflicts with mount units, etc, rather than a specific service.

Neither shutdown.target nor umount.target have any wants or requires by 
default, but the various normal services and mount units conflict with 
them, either via default or specifically, so are shut down before the 
target can be reached.

final.target has the After=shutdown.target umount.target setting, so 
won't be reached until they are reached.

The respective (reboot|poweroff|halt|kexec).target units Requires= and 
After= their respective systemd-*.service units, and reboot and poweroff 
(but not halt and kexec) have 30-minute timeouts after which they run 
reboot-force or poweroff-force, respectively.

The respective systemd-(reboot|poweroff|halt|kexec).service units 
Requires= and After= shutdown.target, umount.target and final.target, all 
three, so won't be run until those complete.  They simply 
ExecStart=/usr/bin/systemctl --force their respective actions.

And here's what the systemd.special (7) manpage says about umount.target:

  umount.target
    A special target unit that umounts all mount and automount points
    on system shutdown.

    Mounts that shall be unmounted on system shutdown shall add
    Conflicts dependencies to this unit for their mount unit,
    which is implicitly done when DefaultDependencies=yes is set
    (the default).

But that /still/ doesn't reveal what actually does the remount-ro, as 
opposed to umount.  I don't see that either, at the unit level, nor do I 
see anything related to it in for instance my auto-generated from fstab 
/run/systemd/generators/-.mount file or in the systemd-fstab-generator 
(8) manpage.

Thus I must conclude that it's actually resolved in the mount-unit 
conflicts handling in systemd's source code, itself.

And indeed... in systemd's tarball, we see in src/core/umount.c, in 
mount_points_list_umount...

That the function actually remounts /everything/ (well, everything not in 
a container) read-only, before actually trying to umount them.  Indention 
restandardized on two-space here, to avoid unnecessary wrapping as 
posted.  This is from systemd-228:

static int mount_points_list_umount(MountPoint **head, bool *changed, bool 
log_error) {
  MountPoint *m, *n;
  int n_failed = 0;

  assert(head);
                
  LIST_FOREACH_SAFE(mount_point, m, n, *head) {

    /* If we are in a container, don't attempt to
       read-only mount anything as that brings no real
       benefits, but might confuse the host, as we remount
       the superblock here, not the bind mound. */
    if (detect_container() <= 0)  {
      _cleanup_free_ char *options = NULL;
      /* MS_REMOUNT requires that the data parameter
       * should be the same from the original mount
       * except for the desired changes. Since we want
       * to remount read-only, we should filter out
       * rw (and ro too, because it confuses the kernel) */
      (void) fstab_filter_options(m->options, "rw\0ro\0", NULL, NULL, 
&options);

      /* We always try to remount directories read-only
       * first, before we go on and umount them.
       *
       * Mount points can be stacked. If a mount
       * point is stacked below / or /usr, we
       * cannot umount or remount it directly,
       * since there is no way to refer to the
       * underlying mount. There's nothing we can do
       * about it for the general case, but we can
       * do something about it if it is aliased
       * somehwere else via a bind mount. If we
       * explicitly remount the super block of that
       * alias read-only we hence should be
       * relatively safe regarding keeping the fs we
       * can otherwise not see dirty. */
      log_info("Remounting '%s' read-only with options '%s'.", m->path, 
options);
      (void) mount(NULL, m->path, NULL, MS_REMOUNT|MS_RDONLY, options);
    }

    /* Skip / and /usr since we cannot unmount that
     * anyway, since we are running from it. They have
     * already been remounted ro. */
    if (path_equal(m->path, "/")
#ifndef HAVE_SPLIT_USR
      || path_equal(m->path, "/usr")
#endif
    )
      continue;

    /* Trying to umount. We don't force here since we rely
        * on busy NFS and FUSE file systems to return EBUSY
        * until we closed everything on top of them. */
    log_info("Unmounting %s.", m->path);
    if (umount2(m->path, 0) == 0) {
      if (changed)
        *changed = true;

      mount_point_free(head, m);
    } else if (log_error) {
      log_warning_errno(errno, "Could not unmount %s: %m", m->path);
      n_failed++;
    }
  }

  return n_failed;
}


So the short answer ultimately is... Systemd has a single umount 
function, which first does remount-ro, so it's actually remounting 
(nearly) everything read-only, then tries umount.


Meanwhile, (semi-)answering the elsewhere implied question of why only 
Linux needs the mount-ro service...  I'm no BSD expert, but in my 
wanderings I came across a remark that they didn't need it, because their 
kernel reboot/halt/poweroff routines have a built-in kernelspace sync-and-
remount-ro routine for anything that can't be unmounted, which Linux 
lacks.  They obviously consider this a Linux deficiency, but while I've 
not come across the Linux reason for not doing it, an educated guess is 
that it's considered putting policy into the kernel, and that's 
considered a no-no, policy is userspace; the kernel simply enforces it as 
directed (which is why kernel 2.4's devfs was removed for 2.6, to be 
replaced with the userspace-based udev).  Additionally, not kernel-
forcing the remount-ro bit does give developers a way to test results of 
an uncontrolled shutdown, say on a specific testing filesystem only, 
without exposing the rest of the system, which can still be shut down 
normally, to it.

So on Linux userspace must do the final umounts and force-read-onlys, 
because unlike the BSDs, the Linux kernel doesn't have builtin routines 
that automatically force it, regardless of userspace.

But as others have said, on Linux the remount-ro is _definitely_ 
required, and "bad things _will_ happen" if it's not done.  (Just how bad 
depends on the filesystem and its mount options, and hardware, among 
other things.)


Finally, one more thing to mention.  On systems with magic-srq in the 
kernel...

echo 0x30 > /proc/sys/kernel/sysrq

... enables the sync (0x10) and remount-readonly (0x20) functions.  (Of 
course only do this at shutdown/reboot, as you don't want to disturb the 
user's configured srq defaults in normal runtime.)

You can then force emergency sync (s) and remount-read-only (u) with...

echo s > /proc/sysrq-trigger
echo u > /proc/sysrq-trigger

As that's kernel emergency priority, it should force-sync and force 
everything readonly (and quiesce mid-layer layer block devices such as md 
and dm), even if it would normally refuse to do so due to files open for 
writing.  You might consider something like that as a fallback, if normal 
mount-readonly fails.  Of course it won't work if magic-srq functionality 
isn't built into the kernel, but then you're no worse off than before, 
and are far better off on kernels where it's supported, so it's certainly 
worth considering. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

[gentoo-dev] Re: rfc: Does OpenRC really need mount-ro

Reply via email to