Hi Hadoop devs,

I would like to announce that we recently reached a new milestone - we
recently finished all the tasks in item 3 under Phase 1. This implies that
all the HDFS native client tools[1] have become cross platform now. We're
inching closer towards making Hadoop cross platform. Watch this space for
more updates.

[1] =
https://github.com/apache/hadoop/tree/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tools

Thanks,
--Gautham

On Mon, 21 Feb 2022 at 00:12, Gautham Banasandra <gaur...@apache.org> wrote:

> Hi all,
>
> I've been working on getting Hadoop to build on Windows for quite some
> time now. We're now at a stage where we can parallelize the effort and
> complete this sooner. I've outlined the parts that are remaining. Please
> get in touch with me if anyone wishes to join hands in realizing this goal.
>
> *Why do we need Hadoop to run on Windows?*
> Windows has a very large user base. The modern alternative softwares to
> Hadoop (like Kubernetes) are cross platform by design. We have to
> acknowledge the fact it isn't easy to get Hadoop running on Windows. The
> reason why we haven't seen much adoption of Hadoop on Windows is probably
> because of issues like compilation, requiring work-arounds every step of
> the way etc. If we were to nail these issues, I believe it would
> tremendously expand the usage of Hadoop.
>
> I plan to complete this in 4 phases.
>
> *Phase 1 : Building Hadoop on Windows*
> 1. [HADOOP-17193] Compile Hadoop on Windows natively - ASF JIRA
> (apache.org) <https://issues.apache.org/jira/browse/HADOOP-17193>
> The Hadoop build on Windows is currently broken because of the POSIX API
> calls made in the HDFS native client (libhdfspp). MinGW and Cygwin
> provide POSIX implementation on Windows. While it's possible to use these
> C++ compilers, it won't be the same as compiling Hadoop with Visual C++.
> The Visual C++ runtime is the native C++ runtime on Windows and provides
> much more capabilities (like core dumps etc.) than its alternatives. Thus,
> it's essential to get Hadoop to compile with Visual Studio on Windows.
> We'll be using Visual Studio 2019.
>
> 2. [HDFS-15843] [libhdfs++] Make write cross platform - ASF JIRA
> (apache.org) <https://issues.apache.org/jira/browse/HDFS-15843>
> Until recently, Hadoop was being built with C++11. I upgraded the compiler
> version to a level where it supports C++17 so that we've access to
> std::filesystem and a few other modern C++ APIs. However, there are some
> cases where the C++17 APIs don't suffice. Thus, I wrote the XPlatform
> library
> <https://github.com/apache/hadoop/tree/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/x-platform>,
> which is a collection of system call APIs implemented in a cross-platform
> friendly manner. The CMake build system will choose the appropriate
> platform implementation while building so that we can do away with all the
> #ifdefs based on platform in the code. In summary, if you ever come across
> a need to use system calls, please put them into the XPlatform library and
> use its APIs.
>
> 3. [HDFS-16474] Make HDFS tail tool cross platform - ASF JIRA (apache.org)
> <https://issues.apache.org/jira/browse/HDFS-16474>
>     [HDFS-16473] Make HDFS stat tool cross platform - ASF JIRA
> (apache.org) <https://issues.apache.org/jira/browse/HDFS-16473>
>     [HDFS-16472] Make HDFS setrep tool cross platform - ASF JIRA
> (apache.org) <https://issues.apache.org/jira/browse/HDFS-16472>
>     [HDFS-16471] Make HDFS ls tool cross platform - ASF JIRA (apache.org)
> <https://issues.apache.org/jira/browse/HDFS-16471>
>     [HDFS-16470] Make HDFS find tool cross platform - ASF JIRA
> (apache.org) <https://issues.apache.org/jira/browse/HDFS-16470>
> The HDFS native client tools use getopt API to parse the command line
> arguments. getopt isn't available on Windows. One can follow this PR to
> make the above tools cross platform compatible - HDFS-16285. Make HDFS
> ownership tools cross platform by GauthamBanasandra · Pull Request #3588 ·
> apache/hadoop (github.com) <https://github.com/apache/hadoop/pull/3588>.
>
> 4. [HDFS-16463] Make dirent.h cross platform compatible - ASF JIRA
> (apache.org) <https://issues.apache.org/jira/browse/HDFS-16463>
>     [HDFS-16465] Make usage of strings.h cross platform compatible - ASF
> JIRA (apache.org) <https://issues.apache.org/jira/browse/HDFS-16465>
> For these JIRAs, the header files aren't there for Windows. Thus, we need
> to inspect the APIs that have been used from these headers and implement
> them.
>
> 5. [HDFS-16464] Create only libhdfspp static libraries for Windows - ASF
> JIRA (apache.org) <https://issues.apache.org/jira/browse/HDFS-16464>
> There are some issues with producing Hadoop dlls on Windows. So, let's
> plan to just deliver only static libraries in this phase.
>
> 6. [HDFS-16466] Implement Linux permission flags on Windows - ASF JIRA
> (apache.org) <https://issues.apache.org/jira/browse/HDFS-16466>
> 7. [HDFS-16467] Ensure Protobuf generated headers are included first -
> ASF JIRA (apache.org) <https://issues.apache.org/jira/browse/HDFS-16467>
> 8. [HDFS-16468] Define ssize_t for Windows - ASF JIRA (apache.org)
> <https://issues.apache.org/jira/browse/HDFS-16468>
> 9. [HDFS-16469] Locate protoc-gen-hrpc.exe on Windows - ASF JIRA
> (apache.org) <https://issues.apache.org/jira/browse/HDFS-16469>
> 10. [YARN-11078] Set env vars in a cross platform compatible way - ASF
> JIRA (apache.org) <https://issues.apache.org/jira/browse/YARN-11078>
>
> *Phase 2 : Setup CI for Hadoop on Windows*
> 1. [HADOOP-18133] Add Dockerfile for Windows 10 - ASF JIRA (apache.org)
> <https://issues.apache.org/jira/browse/HADOOP-18133>
> 2. [HADOOP-18134] Run CI for Windows 10 - ASF JIRA (apache.org)
> <https://issues.apache.org/jira/browse/HADOOP-18134>
> We really must setup the CI for Hadoop on Windows to ensure that this
> never breaks again.
>
> *Phase 3 : Resolving systemic issues*
> 1. [HADOOP-13223] winutils.exe is a bug nexus and should be killed with
> an axe. - ASF JIRA (apache.org)
> <https://issues.apache.org/jira/browse/HADOOP-13223>
> The Hadoop environment is modeled closer to that of Linux than Windows.
> Thus, we see a lot of functional gaps between running Hadoop on Linux v/s
> Windows, which have become the source of bugs when it comes to running
> Hadoop on Windows. One such issue is that of winutils.exe. We can aim to
> address issues like these in this phase. I plan to provide JNI
> implementation for each platform and unify these under a common file system
> interface. So that we get stack traces for exceptions thrown in these
> layers and mostly so that we don't have any disparity between the platforms.
>
> *Phase 4 : Produce Windows distribution of Hadoop*
> 1. [HADOOP-18135] Produce Windows binaries of Hadoop - ASF JIRA
> (apache.org) <https://issues.apache.org/jira/browse/HADOOP-18135>
> The public should be able to download and install Hadoop on their Windows
> computers.
>
> Thanks,
> --Gautham
>

Reply via email to