Re: [C++] The quest for zero-dependency builds

Antoine Pitrou Thu, 10 Oct 2019 06:09:59 -0700


Yes, we could express dependencies in a Python script and have it
generate a CMake module of if/else chains in cmake_modules (which we
would check in git to avoid having people depend on a Python install,
perhaps).


Still, that is an additional maintenance burden.

Regards

Antoine.


Le 10/10/2019 à 14:50, Wes McKinney a écrit :
> I guess one question we should first discuss is: who is the C++ build
> system for?
> 
> The users who are most sensitive to benchmark-driven decision making
> will generally be consuming the project through pre-built binaries,
> like our Python or R packages. If C++ developers build the project
> from source and don't do a minimal read of the documentation to see
> what a "recommended configuration" looks like, I would say that is
> more their fault than ours. In the case of the ARROW_JEMALLOC option,
> I think it's important for C++ system integrators to be aware of the
> impact of the choice of memory allocator.
> 
> The concern I have with the current "out of the box" experience is
> that people are getting the impression that "I have to build $X, $Y,
> and $Z -- which I don't necessarily need -- to have $CORE_FEATURE_1".
> They can, of course, read the documentation and learn that those
> things can be toggled off, but I think the user that reaches for a
> self-built source install is much different in general than someone
> who uses the project through the Linux binary packages, for example.
> 
> On the subject of managing intraproject dependencies and
> relationships, I think we should develop a better way to express
> relationships between components than we have now.
> 
> As an example, building the Python library assumes that various
> components are enabled
> 
> - ARROW_COMPUTE=ON
> - ARROW_FILESYSTEM=ON
> - ARROW_IPC=ON
> 
> Somewhere in the code we might have some code like
> 
> if (ARROW_PYTHON)
>   set(ARROW_COMPUTE ON)
>   ...
> endif()
> 
> This doesn't strike me as that scalable. I would rather see a
> dependency file like
> 
> component_dependencies = {
>     ...
>     'python': ['compute', 'filesystem', 'ipc'],
>     ...
> }
> 
> A helper Python script as part of the build could be used to give
> CMake (because CMake is a bit poor as a programming language) the list
> of required components based on what the user has indicated to CMake.
> 
> On Thu, Oct 10, 2019 at 7:36 AM Francois Saint-Jacques
> <fsaintjacq...@gmail.com> wrote:
>>
>> There's always the route of vendoring some library and not exposing
>> external CMake options. This would achieve the goal of
>> compile-out-of-the-box and enable important feature in the basic
>> build. We also simplify dependencies requirements (benefits CI or
>> developer). The downside is following security patches and grumpy
>> reaction from package maintainers. I think we should explore this
>> route for dependencies that match the following criteria:
>>
>> - libarrow*.so don't export any of the symbols of the dependency and
>> not referenced in any public headers
>> - dependency is lightweight, e.g. excludes boost, openssl, grpc, llvm,
>> thrift, protobuf
>> - dependency is not-ubiquitous on major platform and have a stable
>> API, e.g. excludes libz and openssl
>>
>> A small list of candidates:
>> - RapidJSON (enables JSON)
>> - DoubleConversion (enables CSV)
>>
>> There's a precedent, arrow already vendors small C++ libraries
>> (datetime, utf8cpp, variant, xxhash).
>>
>> François
>>
>>
>> On Thu, Oct 10, 2019 at 6:03 AM Antoine Pitrou <anto...@python.org> wrote:
>>>
>>>
>>> Hi all,
>>>
>>> I'm a bit concerned that we're planning to add many additional build
>>> options in the quest to have a core zero-dependency build in C++.
>>> See for example https://issues.apache.org/jira/browse/ARROW-6633 or
>>> https://issues.apache.org/jira/browse/ARROW-6612.
>>>
>>> The problem is that this is creating many possible configurations and we
>>> will only be testing a tiny subset of them.  Inevitably, users will try
>>> other option combinations and they'll fail building for some random
>>> reason.  It will not be a very good user experience.
>>>
>>> Another related issue is user perception when doing a default build.
>>> For example https://issues.apache.org/jira/browse/ARROW-6638 proposes to
>>> build with jemalloc disabled by default.  Inevitably, people will be
>>> doing benchmarks with this (publicly or not) and they'll conclude Arrow
>>> is not as performant as it claims to be.
>>>
>>> Perhaps we should look for another approach instead?
>>>
>>> For example we could have a single ARROW_BARE_CORE (whatever the name)
>>> option that when enabled (not by default) builds the tiniest minimal
>>> subset of Arrow.  It's more inflexible, but at least it's something that
>>> we can reasonably test.
>>>
>>> Regards
>>>
>>> Antoine.

Re: [C++] The quest for zero-dependency builds

Reply via email to