On Thu, 7 Jul 2022 21:21:27 GMT, Ioi Lam <ik...@openjdk.org> wrote: >> Some Kubernetes setups share the /tmp directory across multiple containers. >> On rare occasions, the JVM may crash when it tries to write to >> `/tmp/hsperfdata_<user>/<pid>` when a process in a separate container >> decides to do the same thing (because they happen to have the same >> namespaced pid). >> >> This patch avoids the crash by using `flock()` to allow only one of these >> processes to write to the file. All other competing processes that fail to >> grab the lock will give up the file and run with PerfMemory disabled. We >> will try to enable PerfMemory for the failed processes in a follow-up RFE: >> [JDK-8289883](https://bugs.openjdk.org/browse/JDK-8289883) >> >> Thanks to Vitaly Davidovich and Nico Williams for coming up with the idea of >> using `flock()`. >> >> I kept the use of `kill()` for stale file detection to be compatible with >> older JVMs. >> >> I also took the opportunity to clean up the comments and remove dead code. >> The old code was using "shared memory resources" which sounds unclear and >> odd. I changed the terminology to say "shared memory file" instead. > > Ioi Lam has updated the pull request incrementally with one additional commit > since the last revision: > > @tstuefe comments
src/hotspot/os/posix/perfMemory_posix.cpp line 743: > 741: > 742: // We now have a file name that converts to a valid integer > 743: // that could represent a process id. "could"? It does, or? Why being ambiguous? (If we change the scheme later, we can be more specific) src/hotspot/os/posix/perfMemory_posix.cpp line 744: > 742: // We now have a file name that converts to a valid integer > 743: // that could represent a process id. > 744: // Can you add a short introduction here similar to: "On Linux, VMs may run in different PID namespaces and therefore their PIDs are not unique. Therefore, we first try to flock() the file ..." src/hotspot/os/posix/perfMemory_posix.cpp line 759: > 757: // signal the process, then the file is assumed to > 758: // be stale and is removed because the files for such a > 759: // process should be in a different user specific directory. Please add a short sentence like: "Finally, if VMs from different containers share /tmp, we cannot reliably determine process liveness using kill. To avoid that, either stop sharing /tmp between containers or only run VMs with the JDK-8286030 fix." src/hotspot/os/posix/perfMemory_posix.cpp line 766: > 764: // The locking protocol works only with JVMs that have the > JDK-8286030 fix. > 765: // If you are sharing the /tmp difrectory among different > containers, do not > 766: // use older JVMs that don't have this fix. Could possible shorten this if you feel the long comment above explains it all already. src/hotspot/os/posix/perfMemory_posix.cpp line 796: > 794: if (!is_locked_by_another_process) { > 795: if ((pid == os::current_process_id()) || > 796: (kill(pid, 0) == OS_ERR && (errno == ESRCH || errno == > EPERM))) { Thinking about this, I now was confused. AFAICS this code is only ever called from `mmap_create_shared` with the directory containing our own pid. I do not see this called for PIDs from other processes. Why do we handle that case? Or am I overlooking something? ------------- PR: https://git.openjdk.org/jdk/pull/9406