Date: Tue, 04 Oct 2022 10:09:35 -0400 From: Christos Zoulas <chris...@zoulas.com> Message-ID: <8dd220d16861eb3a890461bdf02d1...@zoulas.com>
| I always forget the O_CLOEXEC is special | in that regard. I wish it was not, but it is difficult to fix. POSIX is adding O_CLOFORK in the next version (no guarantees I remembered the symbol name spelling correctly here) which will have essentially the same (open time) semantics (similar long term sematics as well, just applied at a different time). I assume we will need to add that at some point or other. | The question is how to find the vnode? Not really, I assume that part will be fairly easy (probably trivial), I just didn't have the energy to go work it out when sending that mail. We have the file descriptor, and I suspect the file* (need to check to make sure the right one is immediately available, but we can get it from the fd if not), we know it refers to a vnode (it came from vn_open()), so getting the vnode* from the file* is not something difficult, I think. | Perhaps it is easiest to fail the open call if O_EXLOCK or | O_SHLOCK are specified in a cloning open? That would be an option, and is better than just ignoring them, but better still would be to make them work. Since open_setfp() does nothing (much) when none of the relevant O_xxx flags that it tests are set (the fd open flags, as distinct from the fp ones), and the code calls VOP_UNLOCK(vp) after it, we know that vp is intended to be locked when open_setfp() is called (further confirmed as when any of the O_??LOCK flags is set, open_setfp() does a VOP_UNLOCK() and later a vn_lock() (which I am guessing is the inverse). Maybe all that's needed is a vn_lock() call (on the vp that we still need to fetch) and then call open_setfp() ? But this is all beyond what I know enough about to be sure, particularly to avoid doing anything which might deadlock, etc. kre ps: if this gets done properly, then special case code to handle O_NONBLOCK (and O_NOSIGPIPE, ...) in cloning device drivers won't be needed either, open_setfp() is where all of that is normally added to the file* for the fd being returned, it was not "simply happening" because that call is missing in the cloned device case.