Monotonic time would be useful, though it would be better to do it
wtihout entering the kernel at all. Adding the last nsec/cycle pair
of the last context switch to the tos seems reasonable. For an example
of how it could look, see:

        /sys/src/cmd/vmx/nanosec.c

Though this still uses nsec to get the initial offset, and is not safe
for multiple procs, it should give an idea of the shape of the solution.
Doing it in userspace seems better [faster, cheaper] than doing it with
a new syscall. For go's needs, /dev/bintime could be opened once at the
start of the program, the base time initialized, and then bintime could
be closed.

As far as file descriptors: My recollection from talking with folks at
hackathons is that OpenBSD added getentropy() in large due to the needs
of pledge and chroot: if you cut off file system access, it becomes hard
to access /dev/random. If it's hard to construct sub-namespaces with devs,
it becomes hard to access /dev/random.  Other systems seem to have followed
suit for similar reasons.

Our situation is different; we approach sandboxing through controlling the
namespace[1], and it's easy to build a namespace where all the expected
names are in place.

I don't buy the argument around accidentally reusing file descriptors; any
file descriptor can end up in one of those slots, bintime, random, et al.
aren't special in that respect.

The reason contemporary operating systems can't deal with opening these
files is because they want to be very strict about access to the namespace,
or because the tools they use are anemic when it comes to namespacing. We
don't need to port over foreign system's problems; we have enough of our own.

[1] we could do a better job, but there are steps being made. We've largely
    replaced 'rfork M' with a more granular option, have private /srv, etc.

Quoth Russ Cox <r...@swtch.com
> Hi all,
> 
> Cinap said out in the other thread that nsec had been added and then
> abandoned because it wasn't right. That turns out only to be half wrong -
> it's not true today but it probably should be true in the future. We do
> need a time-related special system call, but not that one.
> 
> I just saw a Go program crash because it observed monotonic time move
> backward. That happened because on Plan 9, Go does not have easy access to
> monotonic time, only Unix time. And when Unix time moves backward (like
> timesync makes it do) then Go sees that as monotonic time moving backward.
> The ironic thing is that #c/bintime has all the info Go needs, but Go
> stopped using it.
> 
> The nsec system call was added to avoid needing to keep #c/bintime open in
> all programs, avoid the problems of it accidentally using a standard fd (0
> 1 2) etc. But nsec is too specialized. bintime returns more than just Unix
> nanoseconds. The right answer would have been to add a readbintime(p, n)
> system call that acts like pread(/dev/bintime, p, n, 0), dispatching to the
> kernel's readbintime function. I suggest we actually do that, which would
> make monotonic time access work right.
> 
> While we are avoiding pre-opened file descriptors, the other thing modern
> operating systems have come to realize is that /dev/random is important
> enough to be able to access without a file descriptor. It would be good to
> add a readcrypto(p, n) system call at the same time.
> 
> Perhaps there should not be two new system calls. Perhaps it should be one
> new readspecial(id, p, n) system call.
> 
> Or perhaps there should be no new system calls, and instead pread should
> accept a few distinguished negative file descriptors. Obviously fd=-1 has
> to keep returning an error, but perhaps we should define that -2 is
> #c/bintime and -3 is #c/random. Or if -2 is too close to -1, we could use
> -1000 and -1001.
> 
> Personally I think the negative numbers are a bit too special, and I'd be
> inclined to add two new system calls kread(kfd, data, n, off) and
> kwrite(kfd, data, n, off), which are like pread and pwrite except that they
> operate on "kernel file descriptors", which are small integers that are
> always open and refer to specific kernel resources. The initial set of
> kernel file descriptors are
> 
> 0 #c/bintime
> 1 #c/random
> 
> This set could be extended over time; use of an unrecognized kfd would
> return an error. This approach solves the "keep special fds open" problem
> directly, without abandoning Plan 9's "everything is a file" quite as much
> as nsec(2) does or readbintime(2) or readcrypto(2) would.
> 
> Thoughts?
> 
> Best,
> Russ

------------------------------------------
9fans: 9fans
Permalink: 
https://9fans.topicbox.com/groups/9fans/T59810df4fe34a033-M131dd00372c17a4c1b62d3f5
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

Reply via email to