On Thu, 7 Jul 2022 06:01:58 GMT, Ioi Lam <ik...@openjdk.org> wrote:

> Some Kubernetes setups share the /tmp directory across multiple containers. 
> On rare occasions, the JVM may crash when it tries to write to 
> `/tmp/hsperfdata_<user>/<pid>` when a process in a separate container decides 
> to do the same thing (because they happen to have the same namespaced pid).
> 
> This patch avoids the crash by using `flock()` to allow only one of these 
> processes to write to the file. All other competing processes that fail to 
> grab the lock will give up the file and run with PerfMemory disabled. We will 
> try to enable PerfMemory for the failed processes in a follow-up RFE: 
> [JDK-8289883](https://bugs.openjdk.org/browse/JDK-8289883)
> 
> Thanks to Vitaly Davidovich and Nico Williams for coming up with the idea of 
> using `flock()`.
> 
> I kept the use of `kill()` for stale file detection to be compatible with 
> older JVMs.
> 
> I also took the opportunity to clean up the comments and remove dead code. 
> The old code was using "shared memory resources" which sounds unclear and 
> odd. I changed the terminology to say "shared memory file" instead.

Hi Ioi,

not a full review, just a time-limited glance. Will take a closer look later.

Cheers, Thomas

src/hotspot/os/posix/perfMemory_posix.cpp line 708:

> 706:   fd = ::open(filename, O_RDONLY);
> 707:   if (fd >= 0) {
> 708:     is_locked = (flock(fd, LOCK_EX|LOCK_NB) != 0);

Out of the n error conditions for flock(), only EWOULDBLOCK indicates a 
occupied lock. I would handle the rest differently.

src/hotspot/os/posix/perfMemory_posix.cpp line 720:

> 718: 
> 719:   return is_locked;
> 720: }

The interface of this function seems weirdly unbalanced and threw me off at 
first. We hand in fd=-1, and it opens the fd, maybe closes the fd, maybe 
returns a valid fd (via reference, one has to look closely), maybe returns an 
invalid but valid-looking fd...

I would disentangle this a bit either bei moving the open and the close out to 
the caller and making this a simple 


static bool is_locked_by_another_process(int fd);


or, alternatively, just inlining the whole section into the one place where we 
use it.

src/hotspot/os/posix/perfMemory_posix.cpp line 770:

> 768:     // create_sharedmem_file() and is_locked_by_another_process().
> 769:     // If it's already locked by another process, then obviously it's
> 770:     // not stale

Period missing.

src/hotspot/os/posix/perfMemory_posix.cpp line 781:

> 779:     // signal the process, then the file is assumed to
> 780:     // be stale and is removed because the files for such a
> 781:     // process should be in a different user specific directory.

I am not sure this is good. If two conflicting hotspots share the same PID in 
tmp, from two different users, this will probably be a setup error. Is the best 
way really to provoke SIGBUS in the other VM? Seems a bit harsh. 

Also terminology would be wrong. Its not stale then, since the target process 
probably exists, is a VM, and uses that file.

-------------

PR: https://git.openjdk.org/jdk/pull/9406

Reply via email to