Hi everyone, I'd like to start a thread to discuss merging the HDFS-8707 aka libhdfs++ into trunk (as a beta release).
libhdfs++ is an HDFS client written in C++ designed to be used in applications that are written in non-JVM based languages. In its current state it supports kerberos authenticated reads from HDFS and has been used in production clusters for over a year so it has had a significant amount of burn-in time. The HDFS-8707 branch has been around for about 2 years now so I'd like to know people's thoughts on what it would take to merge in the current branch and possibly start a new one for handling writes and encrypted reads. Current features: -A libhdfs/libhdfs3 compatible C API that allows libhdfs++ to serve as a drop-in replacement for clients that only need read support -An asynchronous C++ API with synchronous shims on top if the client application wants to do blocking operations. Internally a single thread (optionally more) uses select/epoll by way of boost::asio to watch thousands of sockets without the overhead of spawning threads to emulate async operation. -Kerberos/SASL authentication support -HA namenode support -A set of utility programs that mirror the HDFS CLI utilities e.g. "./hdfs dfs -chmod". The major benefit of these is the tool startup time is ~3 orders of magnitude faster (<1ms vs hundreds of ms) and occupies a lot less memory since it isn't dealing with the JVM. This makes it possible to do things like write a simple bash script that stats a file, applies some rules to the result, and decides if it should move it in a way that scales to thousands of files without being penalized with O(N) JVM startups. -Cancelable reads. This has proven to be very useful in multiuser applications that (pre)fetch large blocks of data but need to remain responsive for interactive users. Rather than waiting for a large and/or slow read to finish it will return immediately and the associated resources (buffer, file descriptor) become available for the rest of the application to use. There's a few known issues that prevent a merge of the branch as-is, notably that it's lagging extremely far behind trunk - HDFS-12110. There's a patch up to get in sync but that's waiting on CI tests to be unstuck - HDFS-12640, which I haven't been able to figure out (if anyone has tips for investigating this I'd really appreciate it). The other two issues that have been raised are that headers and docs aren't being exported to the correct places when building a distro which will be straightforward to fix once the rebase is done. Thanks!