Hi Hadoop devs, I would like to announce that we recently reached a new milestone - we recently finished all the tasks in item 3 under Phase 1. This implies that all the HDFS native client tools[1] have become cross platform now. We're inching closer towards making Hadoop cross platform. Watch this space for more updates.
[1] = https://github.com/apache/hadoop/tree/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tools Thanks, --Gautham On Mon, 21 Feb 2022 at 00:12, Gautham Banasandra <gaur...@apache.org> wrote: > Hi all, > > I've been working on getting Hadoop to build on Windows for quite some > time now. We're now at a stage where we can parallelize the effort and > complete this sooner. I've outlined the parts that are remaining. Please > get in touch with me if anyone wishes to join hands in realizing this goal. > > *Why do we need Hadoop to run on Windows?* > Windows has a very large user base. The modern alternative softwares to > Hadoop (like Kubernetes) are cross platform by design. We have to > acknowledge the fact it isn't easy to get Hadoop running on Windows. The > reason why we haven't seen much adoption of Hadoop on Windows is probably > because of issues like compilation, requiring work-arounds every step of > the way etc. If we were to nail these issues, I believe it would > tremendously expand the usage of Hadoop. > > I plan to complete this in 4 phases. > > *Phase 1 : Building Hadoop on Windows* > 1. [HADOOP-17193] Compile Hadoop on Windows natively - ASF JIRA > (apache.org) <https://issues.apache.org/jira/browse/HADOOP-17193> > The Hadoop build on Windows is currently broken because of the POSIX API > calls made in the HDFS native client (libhdfspp). MinGW and Cygwin > provide POSIX implementation on Windows. While it's possible to use these > C++ compilers, it won't be the same as compiling Hadoop with Visual C++. > The Visual C++ runtime is the native C++ runtime on Windows and provides > much more capabilities (like core dumps etc.) than its alternatives. Thus, > it's essential to get Hadoop to compile with Visual Studio on Windows. > We'll be using Visual Studio 2019. > > 2. [HDFS-15843] [libhdfs++] Make write cross platform - ASF JIRA > (apache.org) <https://issues.apache.org/jira/browse/HDFS-15843> > Until recently, Hadoop was being built with C++11. I upgraded the compiler > version to a level where it supports C++17 so that we've access to > std::filesystem and a few other modern C++ APIs. However, there are some > cases where the C++17 APIs don't suffice. Thus, I wrote the XPlatform > library > <https://github.com/apache/hadoop/tree/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/x-platform>, > which is a collection of system call APIs implemented in a cross-platform > friendly manner. The CMake build system will choose the appropriate > platform implementation while building so that we can do away with all the > #ifdefs based on platform in the code. In summary, if you ever come across > a need to use system calls, please put them into the XPlatform library and > use its APIs. > > 3. [HDFS-16474] Make HDFS tail tool cross platform - ASF JIRA (apache.org) > <https://issues.apache.org/jira/browse/HDFS-16474> > [HDFS-16473] Make HDFS stat tool cross platform - ASF JIRA > (apache.org) <https://issues.apache.org/jira/browse/HDFS-16473> > [HDFS-16472] Make HDFS setrep tool cross platform - ASF JIRA > (apache.org) <https://issues.apache.org/jira/browse/HDFS-16472> > [HDFS-16471] Make HDFS ls tool cross platform - ASF JIRA (apache.org) > <https://issues.apache.org/jira/browse/HDFS-16471> > [HDFS-16470] Make HDFS find tool cross platform - ASF JIRA > (apache.org) <https://issues.apache.org/jira/browse/HDFS-16470> > The HDFS native client tools use getopt API to parse the command line > arguments. getopt isn't available on Windows. One can follow this PR to > make the above tools cross platform compatible - HDFS-16285. Make HDFS > ownership tools cross platform by GauthamBanasandra · Pull Request #3588 · > apache/hadoop (github.com) <https://github.com/apache/hadoop/pull/3588>. > > 4. [HDFS-16463] Make dirent.h cross platform compatible - ASF JIRA > (apache.org) <https://issues.apache.org/jira/browse/HDFS-16463> > [HDFS-16465] Make usage of strings.h cross platform compatible - ASF > JIRA (apache.org) <https://issues.apache.org/jira/browse/HDFS-16465> > For these JIRAs, the header files aren't there for Windows. Thus, we need > to inspect the APIs that have been used from these headers and implement > them. > > 5. [HDFS-16464] Create only libhdfspp static libraries for Windows - ASF > JIRA (apache.org) <https://issues.apache.org/jira/browse/HDFS-16464> > There are some issues with producing Hadoop dlls on Windows. So, let's > plan to just deliver only static libraries in this phase. > > 6. [HDFS-16466] Implement Linux permission flags on Windows - ASF JIRA > (apache.org) <https://issues.apache.org/jira/browse/HDFS-16466> > 7. [HDFS-16467] Ensure Protobuf generated headers are included first - > ASF JIRA (apache.org) <https://issues.apache.org/jira/browse/HDFS-16467> > 8. [HDFS-16468] Define ssize_t for Windows - ASF JIRA (apache.org) > <https://issues.apache.org/jira/browse/HDFS-16468> > 9. [HDFS-16469] Locate protoc-gen-hrpc.exe on Windows - ASF JIRA > (apache.org) <https://issues.apache.org/jira/browse/HDFS-16469> > 10. [YARN-11078] Set env vars in a cross platform compatible way - ASF > JIRA (apache.org) <https://issues.apache.org/jira/browse/YARN-11078> > > *Phase 2 : Setup CI for Hadoop on Windows* > 1. [HADOOP-18133] Add Dockerfile for Windows 10 - ASF JIRA (apache.org) > <https://issues.apache.org/jira/browse/HADOOP-18133> > 2. [HADOOP-18134] Run CI for Windows 10 - ASF JIRA (apache.org) > <https://issues.apache.org/jira/browse/HADOOP-18134> > We really must setup the CI for Hadoop on Windows to ensure that this > never breaks again. > > *Phase 3 : Resolving systemic issues* > 1. [HADOOP-13223] winutils.exe is a bug nexus and should be killed with > an axe. - ASF JIRA (apache.org) > <https://issues.apache.org/jira/browse/HADOOP-13223> > The Hadoop environment is modeled closer to that of Linux than Windows. > Thus, we see a lot of functional gaps between running Hadoop on Linux v/s > Windows, which have become the source of bugs when it comes to running > Hadoop on Windows. One such issue is that of winutils.exe. We can aim to > address issues like these in this phase. I plan to provide JNI > implementation for each platform and unify these under a common file system > interface. So that we get stack traces for exceptions thrown in these > layers and mostly so that we don't have any disparity between the platforms. > > *Phase 4 : Produce Windows distribution of Hadoop* > 1. [HADOOP-18135] Produce Windows binaries of Hadoop - ASF JIRA > (apache.org) <https://issues.apache.org/jira/browse/HADOOP-18135> > The public should be able to download and install Hadoop on their Windows > computers. > > Thanks, > --Gautham >