On Fri, Jun 3, 2016 at 10:16 AM, Micah Kornfield <emkornfi...@gmail.com> wrote: > Hi Wes, > > At what level do you imagine, the "opt-in" happening. Right now it > seems like it would be fairly straightforward at build time. However, > when we start packaging pyarrow for distribution how do you imagine it > will work? (If [1] already answers this, please let me know, I've been > meaning to take a look at it). >
Where packaging and distribution is concerned, it'd be easiest to provide non-picky users with a kitchen sink build, but otherwise developers could create precisely the build they want with CMake flags, I guess. If certain libraries aren't found then we wouldn't fail the build by default, for example. > I need to grok the python code base a little bit more to understand > the implications of the scope creep and the pain around taking a more > fine-grained component approach. But in general my experience has > been that packaging things together while maintaining clear internal > code boundaries for later separation is a good pragmatic approach. > I'd propose creating an `arrow_io` leaf shared library where we can create a small IO subsystem for reuse amongst different data connectors. We can leave things fairly coarse grained for the time being and break things up later if it becomes onerous for other Arrow developer-users. > As a side note, hopefully, we'll be able to re-use some existing > projects to do the heavy lifting for blob store integration. SFrame > is one option [2] and [3] might be worth investigating as well (both > appear to be Apache 2.0 licensed). While requiring Java + $HADOOP_HOME for HDFS connectivity (wrapper around libhdfs) doesn't excite me that much, the prospect of bugs (or secure cluster issues) creeping up from a 3rd-party HDFS client without the ability to escalate problems to the Apache Hadoop team worries me even more. There is a new official C++ HDFS client in the works after the libhdfs3 patch was not accepted (https://issues.apache.org/jira/browse/HDFS-8707), so this may be worth pursuing once it matures. Thoughts on this welcome. - Wes > > Thanks, > -Micah > > [1] https://github.com/apache/arrow/pull/79/files > [2] https://github.com/apache/incubator-hawq/tree/master/depends/libhdfs3 > [3] https://github.com/aws/aws-sdk-cpp > >