Symlink support on the local file system is still used. One example I can think of is YARN container launch [1].
I would welcome removal of winutils, as already described in various JIRA issues. I think the biggest challenge we'll have is testing of a transition from winutils to the newer Java APIs. The contract tests help, but historically there was also a tendency to break things in downstream dependent projects. I'd suggest taking this on piecemeal, transitioning small pieces of FileSystem off of winutils one at a time. [1] https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java#L1508-L1509 Chris Nauroth On Thu, Nov 10, 2022 at 10:33 AM Wei-Chiu Chuang <weic...@apache.org> wrote: > > > > > > > > * Bare Naked Local File System v0.1.0 doesn't (yet) support symlinks > > or the sticky bit. > > > ok to not support symlinks. The symlinks of HDFS are not being maintained > and I am not aware of anything relying on it. > So I assume people don't need it. > > Sticky bit would be useful, I guess. > > I suppose folks working at Microsoft would be more interested in this work? > Last time I heard, Gautham and Inigo were revamping Hadoop's Windows > support. > > > > * But the bigger issue is how to excise Winutils completely in the > > existing Hadoop code. Winutils assumptions are hard-coded at a low > > level across various classes—even code that has nothing to do with > > the file system. The startup configuration for example calls > > `StringUtils.equalsIgnoreCase("true", valueString)` which loads the > > `StringUtils` class, which has a static reference to `Shell`, which > > has a static block that checks for `WINUTILS_EXE`. > > * For the most part there should no longer even be a need for anything > > but direct Java API access for the local file system. But muddling > > things further, the existing `RawLocalFileSystem` implementation has > > /four/ ways to access the local file system: Winutils, JNI calls, > > shell access, and a "new" approach using "stat". The "stat" approach > > has been switched off with a hard-coded `useDeprecatedFileStatus = > > true` because of HADOOP-9652 > > <https://issues.apache.org/jira/browse/HADOOP-9652>. > > * Local file access is not contained within `RawLocalFileSystem` but > > is scattered across other classes; `FileUtil.readLink()` for example > > (which `RawLocalFileSystem` calls because of the deprecation issue > > above) uses the shell approach without any option to change it. > > (This implementation-specific decision should have been contained > > within the `FileSystem` implementation itself.) > > > > In short, it's a mess that has accumulated over years and getting worse, > > charging high interest on what at first was a small, self-contained > > technical debt. > > > > I would welcome the opportunity to clean up this mess. I'm probably as > > qualified as anyone to make the changes. This is one of my areas of > > expertise: I was designing a full abstract file system interface (with > > pure-Java from-scratch implementations for the local file system, > > Subversion, and WebDAV—even the WebDAV HTTP implementation was from > > scratch) around the time Apache Nutch was getting off the ground. Most > > recently I've worked on the Hadoop `FileSystem` API contracting for > > LinkedIn, discovering (what I consider to be) a huge bug in > > ViewFilesystem, HADOOP-18525 > > <https://issues.apache.org/jira/browse/HADOOP-18525>. > > > > The cleanup should be done in several stages (e.g. consolidating > > WinUtils access; replacing code with pure Java API calls; undeprecating > > the new Stat code and relegating it to a different class, etc.). > > Unfortunately it's not financially feasible for me to sit here for > > several months and revamp the Hadoop `FileSystem` subsystem for fun > > (even though I wish I could). Perhaps there is job opening at a company > > related to Hadoop that would be interested in hiring me and devoting a > > certain percentage of my time to fixing local `FileSystem` access. If > > so, let me know where I should send my resume > > <https://www.garretwilson.com/about/resume>. > > > > Otherwise let me know if any ideas for a way forward. If there proves to > > be interest in GlobalMentor Hadoop Bare Naked Local FileSystem > > <https://github.com/globalmentor/hadoop-bare-naked-local-fs> on GitHub > > I'll try to maintain and improve it, but really what needs to be > > revamped is the Hadoop codebase itself. I'll be happy when Hadoop is > > fixed so that both Steve's code and my code are no longer needed. > > > > Garret > > >