On Thu, 10 Apr 2025 05:52:17 GMT, Alan Bateman <al...@openjdk.org> wrote:
>> Hi, this is a proposal to fix 8352728. >> >> The main idea is to replace >> [`java.nio.file.Path::toRealPath`](https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/nio/file/Path.html#toRealPath(java.nio.file.LinkOption...)) >> by >> [`java.io.File::getCanonicalPath`](https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/io/File.html#getCanonicalPath()) >> for path canonicalization purposes. The rationale behind this decision is >> the following: >> >> 1. In _Windows_, `File::getCanonicalPath` handles restricted permissions in >> parent directories. Contrarily, `Path::toRealPath` fails with >> `AccessDeniedException`. >> 2. In _Linux_, `File::getCanonicalPath` handles non-regular files (e.g. >> `/dev/stdin`). Contrarily, `Path::toRealPath` fails with >> `NoSuchFileException`. >> >> #### Windows Case >> >> @martinuy and I tracked down the `File::getCanonicalPath` vs >> `Path::toRealPath` behaviour differences in _Windows_. Both methods end up >> calling the >> [`FindFirstFileW`](https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-findfirstfilew) >> API inside a loop for each parent directory in the path, until they include >> the leaf: >> >> * >> [`File::getCanonicalPath`](https://github.com/openjdk/jdk/blob/jdk-24-ga/src/java.base/share/classes/java/io/File.java#L618 >> "src/java.base/share/classes/java/io/File.java:618") goes through the >> following stack into `FindFirstFileW`: >> * >> [`WinNTFileSystem::canonicalize`](https://github.com/openjdk/jdk/blob/jdk-24-ga/src/java.base/windows/classes/java/io/WinNTFileSystem.java#L473 >> "src/java.base/windows/classes/java/io/WinNTFileSystem.java:473") >> * >> [`WinNTFileSystem::canonicalize0`](https://github.com/openjdk/jdk/blob/jdk-24-ga/src/java.base/windows/native/libjava/WinNTFileSystem_md.c#L288 >> "src/java.base/windows/native/libjava/WinNTFileSystem_md.c:288") >> * >> [`wcanonicalize`](https://github.com/openjdk/jdk/blob/jdk-24-ga/src/java.base/windows/native/libjava/canonicalize_md.c#L233 >> "src/java.base/windows/native/libjava/canonicalize_md.c:233") (here is the >> loop) >> * If `FindFirstFileW` fails with `ERROR_ACCESS_DENIED`, >> `lastErrorReportable` is consulted, the error is [considered >> non-reportable](https://github.com/openjdk/jdk/blob/jdk-24-ga/src/java.base/windows/native/libjava/canonicalize_md.c#L139 >> "src/java.base/windows/native/libjava/canonicalize_md.c:139") and the >> iteration is stopped >> [here](https://github.com/openjdk/jdk/blob/jdk-24-ga/src/java.base/windows/native/libjava/canonicalize_md.c#L246-L250 >> "src/ja... > > I don't think this change should be integrated before more investigation. I > think start by finding out why this code is using toRealPath. For the two > cases listed, it looks like toRealPath is correctly failing for the first, > but for /dev/stdin then please bring it to nio-dev to discuss how special > devices should be handled by that method. Hi again @AlanBateman, I've been doing some research on _Linux_, debugging the following sample: ### `GetContentAndRealPath.java` import java.nio.file.Files; import java.nio.file.Path; import static java.nio.charset.StandardCharsets.UTF_8; public final class GetContentAndRealPath { public static void main(String[] args) throws Exception { if (args.length != 1) { throw new Exception("Please specify a file path"); } Path path = Path.of(args[0]); System.out.println("%(java) Content:"); System.out.println("%(java) => " + new String(Files.readAllBytes(path), UTF_8).trim()); System.out.println("%(java)"); System.out.println("%(java) File::getCanonicalPath:"); System.out.println("%(java) => " + path.toFile().getCanonicalFile().toPath()); System.out.println("%(java)"); System.out.println("%(java) Path::toRealPath:"); System.out.println("%(java) => " + path.toRealPath()); } } The first thing to note is that `/dev/stdin` is not a problem per se, the actual problem is when it is provided by an anonymous pipe. So the following works fine: $ java GetContentAndRealPath.java /dev/stdin </dev/null %(java) Content: %(java) => %(java) %(java) File::getCanonicalPath: %(java) => /dev/null %(java) %(java) Path::toRealPath: %(java) => /dev/null In Bash, there are various ways to provide `stdin` through an anonymous pipe: # https://www.gnu.org/software/bash/manual/bash.html#Pipelines echo Pipelines | java GetContentAndRealPath.java /dev/stdin # https://www.gnu.org/software/bash/manual/bash.html#Here-Strings java GetContentAndRealPath.java /dev/./stdin <<<Here-Strings # https://www.gnu.org/software/bash/manual/bash.html#Here-Documents java GetContentAndRealPath.java /etc/../dev/stdin <<EOF Here-Documents EOF # https://www.gnu.org/software/bash/manual/bash.html#Process-Substitution java GetContentAndRealPath.java <(echo Process-Substitution) Here-Documents example: $ java GetContentAndRealPath.java /etc/../dev/stdin <<EOF Here-Documents EOF %(java) Content: %(java) => Here-Documents %(java) %(java) File::getCanonicalPath: %(java) => /dev/stdin %(java) %(java) Path::toRealPath: Exception in thread "main" java.nio.file.NoSuchFileException: /etc/../dev/stdin at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92) at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) at java.base/sun.nio.fs.UnixPath.toRealPath(UnixPath.java:834) at GetContentAndRealPath.main(GetContentAndRealPath.java:24) Please note how `File::getCanonicalPath` resolves `/etc/../dev/stdin` → `/dev/stdin` while `Path::toRealPath` fails with `NoSuchFileException`. ### _glibc_'s `realpath()` I traced this behaviour inside _glibc_'s `realpath()`, with the following GDB script: gdb -q --nx java <<'EOF' 2>&1 | grep --color=never '^%(\w*)|^Exception|^ at' # Settings set debuginfod enabled on set breakpoint pending on handle SIGSEGV nostop noprint pass # Break once at JDK_Canonicalize(), if the path starts with /dev/fd/ tbreak JDK_Canonicalize if ((int) strncmp(orig, "/dev/fd/", 8)) == 0 # Start java with a Process-Substitution anonymous pipe run GetContentAndRealPath.java <(echo Process-Substitution) # Stopped at JDK_Canonicalize() add a trace for each glibc's realpath() call # https://github.com/bminor/glibc/blob/glibc-2.39/stdlib/canonicalize.c#L431 break canonicalize.c:431 commands python begin_realpath() continue end # Inside glibc's realpath() implementation, also trace each readlink() result # https://github.com/bminor/glibc/blob/glibc-2.39/stdlib/canonicalize.c#L311 break canonicalize.c:311 commands python after_readlink() continue end ############################################################ python from errno import errorcode def begin_realpath(): name = gdb.parse_and_eval('name').string('utf-8') print(f'%(glibc) realpath("{name}")') def after_readlink(): rname = gdb.parse_and_eval('rname').string('utf-8') print(f'%(glibc) readlink("{rname}") -> ', end='') n = int(gdb.parse_and_eval('n')) if n >= 0: buf = gdb.parse_and_eval('buf').string('utf-8')[:n] print(f'"{buf}"') else: errno = int(gdb.parse_and_eval('errno')) print(f'{errorcode[errno]} ({errno})') end ############################################################ # Resume execution from JDK_Canonicalize() continue EOF It produces the following output: %(java) Content: %(java) => Process-Substitution %(java) %(java) File::getCanonicalPath: %(glibc) realpath("/dev/fd/63") %(glibc) readlink("/dev") -> EINVAL (22) %(glibc) readlink("/dev/fd") -> "/proc/self/fd" %(glibc) readlink("/proc") -> EINVAL (22) %(glibc) readlink("/proc/self") -> "1390261" %(glibc) readlink("/proc/1390261") -> EINVAL (22) %(glibc) readlink("/proc/1390261/fd") -> EINVAL (22) %(glibc) readlink("/proc/1390261/fd/63") -> "pipe:[18671624]" %(glibc) readlink("/proc/1390261/fd/pipe:[18671624]") -> ENOENT (2) %(glibc) realpath("/dev/fd") %(glibc) readlink("/dev") -> EINVAL (22) %(glibc) readlink("/dev/fd") -> "/proc/self/fd" %(glibc) readlink("/proc") -> EINVAL (22) %(glibc) readlink("/proc/self") -> "1390261" %(glibc) readlink("/proc/1390261") -> EINVAL (22) %(glibc) readlink("/proc/1390261/fd") -> EINVAL (22) %(java) => /proc/1390261/fd/63 %(java) %(java) Path::toRealPath: %(glibc) realpath("/dev/fd/63") %(glibc) readlink("/dev") -> EINVAL (22) %(glibc) readlink("/dev/fd") -> "/proc/self/fd" %(glibc) readlink("/proc") -> EINVAL (22) %(glibc) readlink("/proc/self") -> "1390261" %(glibc) readlink("/proc/1390261") -> EINVAL (22) %(glibc) readlink("/proc/1390261/fd") -> EINVAL (22) %(glibc) readlink("/proc/1390261/fd/63") -> "pipe:[18671624]" %(glibc) readlink("/proc/1390261/fd/pipe:[18671624]") -> ENOENT (2) Exception in thread "main" java.nio.file.NoSuchFileException: /dev/fd/63 at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92) at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) at java.base/sun.nio.fs.UnixPath.toRealPath(UnixPath.java:834) at GetContentAndRealPath.main(GetContentAndRealPath.java:24) We can see that `realpath()` invokes `readlink()` several times for the partially normalized path. Anonymous pipes live in the [**PipeFS** virtual filesystem](https://okc1.github.io/blog/2015/07/23/how-is-pipe-implemented-unix), which isn't mounted in userspace. However, anonymous pipes do have a `dentry` which gives them a `dname` with the `pipe:[<i_ino>]` format (Code: [`readlink` syscall](https://github.com/torvalds/linux/blob/v6.12/fs/stat.c#L574) → [`do_readlinkat`](https://github.com/torvalds/linux/blob/v6.12/fs/stat.c#L551) → [`vfs_readlink`](https://github.com/torvalds/linux/blob/v6.12/fs/namei.c#L5255) → [`proc_pid_readlink`](https://github.com/torvalds/linux/blob/v6.12/fs/proc/base.c#L1856) → [`do_proc_readlink`](https://github.com/torvalds/linux/blob/v6.12/fs/proc/base.c#L1827) → [`d_path`](https://github.com/torvalds/linux/blob/v6.12/fs/d_path.c#L270-L283) → [`pipefs_dname`](https://github.com/torvalds/linux/blob/v6.12/fs/pipe.c#L867-L874). Also, there are som e higher level explanations [here](https://unix.stackexchange.com/a/124977) and [here](https://narkive.com/QM2vnkZu.5)). When `readlink()` resolves `/proc/<pid>/fd/63`, it returns the `pipe:[18543635]` link target, which makes the symlink look "broken" (from the userspace perspective) and this ultimately makes `realpath()` fail with `ENOENT (2)`. However, the unresolved `/proc/<pid>/fd/0` link works, as the _Linux Kernel_ can access the anonymous pipe behind it: fferrari@vmhost:~$ echo TEST | sleep 20 &>/dev/null &disown [1] 1386980 fferrari@vmhost:~$ cat $(realpath /proc/$(pgrep sleep)/fd/0) cat: '/proc/1386980/fd/pipe:[18629664]': No such file or directory fferrari@vmhost:~$ cat /proc/$(pgrep sleep)/fd/0 TEST ### _OpenJDK_ APIs difference We can see in [`UnixNativeDispatcher::realpath0`](https://github.com/openjdk/jdk/blob/jdk-25+17/src/java.base/unix/native/libnio/fs/UnixNativeDispatcher.c#L1105 "src/java.base/unix/native/libnio/fs/UnixNativeDispatcher.c:1105") that `Path::toRealPath` translates the _glibc_'s `realpath()` failure immediately. On the other hand `File::getCanonicalPath` goes through [`UnixFileSystem::canonicalize0`](https://github.com/openjdk/jdk/blob/jdk-25+17/src/java.base/unix/native/libjava/UnixFileSystem_md.c#L86-L87 "src/java.base/unix/native/libjava/UnixFileSystem_md.c:86") and `JDK_Canonicalize`, [which retries `realpath()` until some subpath works](https://github.com/openjdk/jdk/blob/jdk-25+17/src/java.base/unix/native/libjava/canonicalize_md.c#L67-L86 "src/java.base/unix/native/libjava/canonicalize_md.c:86"). This aligns with the previous GDB experiment, which showed `File::getCanonicalPath` is doing an additional `realpath("/dev/fd")` call when `realpath("/dev/fd/63")` fails. Given it just follows the _glibc_'s behaviour, I don't think a `Path::toRealPath` change is justifiable for an `nio-dev` request. For example, [an equivalent discussion has been raised for Rust](https://github.com/rust-lang/rust/issues/95239), and the developers aren't considering making any change. `File::getCanonicalPath` seems to take the best-effort approach (both in _Linux_ and _Windows_), whereas `Path::toRealPath` is stricter. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24465#issuecomment-2803609222