On Thu, 7 Jul 2022 06:01:58 GMT, Ioi Lam <ik...@openjdk.org> wrote: > Some Kubernetes setups share the /tmp directory across multiple containers. > On rare occasions, the JVM may crash when it tries to write to > `/tmp/hsperfdata_<user>/<pid>` when a process in a separate container decides > to do the same thing (because they happen to have the same namespaced pid). > > This patch avoids the crash by using `flock()` to allow only one of these > processes to write to the file. All other competing processes that fail to > grab the lock will give up the file and run with PerfMemory disabled. We will > try to enable PerfMemory for the failed processes in a follow-up RFE: > [JDK-8289883](https://bugs.openjdk.org/browse/JDK-8289883) > > Thanks to Vitaly Davidovich and Nico Williams for coming up with the idea of > using `flock()`. > > I kept the use of `kill()` for stale file detection to be compatible with > older JVMs. > > I also took the opportunity to clean up the comments and remove dead code. > The old code was using "shared memory resources" which sounds unclear and > odd. I changed the terminology to say "shared memory file" instead.
Hi Ioi, not a full review, just a time-limited glance. Will take a closer look later. Cheers, Thomas src/hotspot/os/posix/perfMemory_posix.cpp line 708: > 706: fd = ::open(filename, O_RDONLY); > 707: if (fd >= 0) { > 708: is_locked = (flock(fd, LOCK_EX|LOCK_NB) != 0); Out of the n error conditions for flock(), only EWOULDBLOCK indicates a occupied lock. I would handle the rest differently. src/hotspot/os/posix/perfMemory_posix.cpp line 720: > 718: > 719: return is_locked; > 720: } The interface of this function seems weirdly unbalanced and threw me off at first. We hand in fd=-1, and it opens the fd, maybe closes the fd, maybe returns a valid fd (via reference, one has to look closely), maybe returns an invalid but valid-looking fd... I would disentangle this a bit either bei moving the open and the close out to the caller and making this a simple static bool is_locked_by_another_process(int fd); or, alternatively, just inlining the whole section into the one place where we use it. src/hotspot/os/posix/perfMemory_posix.cpp line 770: > 768: // create_sharedmem_file() and is_locked_by_another_process(). > 769: // If it's already locked by another process, then obviously it's > 770: // not stale Period missing. src/hotspot/os/posix/perfMemory_posix.cpp line 781: > 779: // signal the process, then the file is assumed to > 780: // be stale and is removed because the files for such a > 781: // process should be in a different user specific directory. I am not sure this is good. If two conflicting hotspots share the same PID in tmp, from two different users, this will probably be a setup error. Is the best way really to provoke SIGBUS in the other VM? Seems a bit harsh. Also terminology would be wrong. Its not stale then, since the target process probably exists, is a VM, and uses that file. ------------- PR: https://git.openjdk.org/jdk/pull/9406