Symlink support on the local file system is still used. One example I can
think of is YARN container launch [1].

I would welcome removal of winutils, as already described in various JIRA
issues. I think the biggest challenge we'll have is testing of a transition
from winutils to the newer Java APIs. The contract tests help, but
historically there was also a tendency to break things in downstream
dependent projects.

I'd suggest taking this on piecemeal, transitioning small pieces of
FileSystem off of winutils one at a time.

[1]
https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java#L1508-L1509

Chris Nauroth


On Thu, Nov 10, 2022 at 10:33 AM Wei-Chiu Chuang <weic...@apache.org> wrote:

> >
> >
> >
> >   * Bare Naked Local File System v0.1.0 doesn't (yet) support symlinks
> >     or the sticky bit.
> >
> ok to not support symlinks. The symlinks of HDFS are not being maintained
> and I am not aware of anything relying on it.
> So I assume people don't need it.
>
> Sticky bit would be useful, I guess.
>
> I suppose folks working at Microsoft would be more interested in this work?
> Last time I heard, Gautham and Inigo were revamping Hadoop's Windows
> support.
>
>
> >   * But the bigger issue is how to excise Winutils completely in the
> >     existing Hadoop code. Winutils assumptions are hard-coded at a low
> >     level across various classes—even code that has nothing to do with
> >     the file system. The startup configuration for example calls
> >     `StringUtils.equalsIgnoreCase("true", valueString)` which loads the
> >     `StringUtils` class, which has a static reference to `Shell`, which
> >     has a static block that checks for `WINUTILS_EXE`.
> >   * For the most part there should no longer even be a need for anything
> >     but direct Java API access for the local file system. But muddling
> >     things further, the existing `RawLocalFileSystem` implementation has
> >     /four/ ways to access the local file system: Winutils, JNI calls,
> >     shell access, and a "new" approach using "stat". The "stat" approach
> >     has been switched off with a hard-coded `useDeprecatedFileStatus =
> >     true` because of HADOOP-9652
> >     <https://issues.apache.org/jira/browse/HADOOP-9652>.
> >   * Local file access is not contained within `RawLocalFileSystem` but
> >     is scattered across other classes; `FileUtil.readLink()` for example
> >     (which `RawLocalFileSystem` calls because of the deprecation issue
> >     above) uses the shell approach without any option to change it.
> >     (This implementation-specific decision should have been contained
> >     within the `FileSystem` implementation itself.)
> >
> > In short, it's a mess that has accumulated over years and getting worse,
> > charging high interest on what at first was a small, self-contained
> > technical debt.
> >
> > I would welcome the opportunity to clean up this mess. I'm probably as
> > qualified as anyone to make the changes. This is one of my areas of
> > expertise: I was designing a full abstract file system interface (with
> > pure-Java from-scratch implementations for the local file system,
> > Subversion, and WebDAV—even the WebDAV HTTP implementation was from
> > scratch) around the time Apache Nutch was getting off the ground. Most
> > recently I've worked on the Hadoop `FileSystem` API contracting for
> > LinkedIn, discovering (what I consider to be) a huge bug in
> > ViewFilesystem, HADOOP-18525
> > <https://issues.apache.org/jira/browse/HADOOP-18525>.
> >
> > The cleanup should be done in several stages (e.g. consolidating
> > WinUtils access; replacing code with pure Java API calls; undeprecating
> > the new Stat code and relegating it to a different class, etc.).
> > Unfortunately it's not financially feasible for me to sit here for
> > several months and revamp the Hadoop `FileSystem` subsystem for fun
> > (even though I wish I could). Perhaps there is job opening at a company
> > related to Hadoop that would be interested in hiring me and devoting a
> > certain percentage of my time to fixing local `FileSystem` access. If
> > so, let me know where I should send my resume
> > <https://www.garretwilson.com/about/resume>.
> >
> > Otherwise let me know if any ideas for a way forward. If there proves to
> > be interest in GlobalMentor Hadoop Bare Naked Local FileSystem
> > <https://github.com/globalmentor/hadoop-bare-naked-local-fs> on GitHub
> > I'll try to maintain and improve it, but really what needs to be
> > revamped is the Hadoop codebase itself. I'll be happy when Hadoop is
> > fixed so that both Steve's code and my code are no longer needed.
> >
> > Garret
> >
>

Reply via email to