Dev's

I would request to be as conservative as possible in choosing (keeping) a build 
system.

For developers, packagers and even end-users for some languages the build 
system is just
another dependency. Even if cmake is not ideal, it has become quite ubiquitous 
which is a huge plus.

Maybe it is possible to come up with a way of expressing the dependency 
relations in cmake in
a way that makes maintaining them easier. Otherwise it is maybe possible to 
generate them from
a (simple) description file?

Cheers,
Maarten.


> On Oct 19, 2019, at 11:22 PM, Micah Kornfield <emkornfi...@gmail.com> wrote:
> 
>> 
>> Perhaps meson is also worth exploring?
> 
> 
> It could be, if someone else wants to take a look we can, compare what
> things look at in each. Recently, Bazel build rules seem like they would be
> useful for some work projects I've been dealing with, so I plan on focusing
> my exploration there.
> 
> On Wed, Oct 16, 2019 at 6:27 AM Antoine Pitrou <anto...@python.org> wrote:
> 
>> 
>> Perhaps meson is also worth exploring?
>> 
>> 
>> Le 15/10/2019 à 23:06, Micah Kornfield a écrit :
>>> Hi Wes,
>>> I agree on both accounts that it won't be a done in the short term, and
>> it
>>> makes sense to tackle in incrementally.  Like I said I don't have much
>>> bandwidth at the moment but might be able to re-arrange a few things on
>> my
>>> plate.  I think some people have asked on the mailing list how they might
>>> be able to help, this might be one area that doesn't require a lot of
>>> in-depth knowledge of C++ at least for a proof of concept.  I'll try to
>>> open up some JIRAs soon.
>>> 
>>> Thanks,
>>> Micah
>>> 
>>> On Tue, Oct 15, 2019 at 10:33 AM Wes McKinney <wesmck...@gmail.com>
>> wrote:
>>> 
>>>> hi Micah,
>>>> 
>>>> Definitely Bazel is worth exploring, but we must be realistic about
>>>> the amount of energy (several hundred hours or more) that's been
>>>> invested in the build system we have now. So a new build system will
>>>> be a large endeavor, but hopefully can make things simpler.
>>>> 
>>>> Aside from the requirements gathering process, if it is felt that
>>>> Bazel is a possible path forward in the future, it may be good to try
>>>> to break up the work into more tractable pieces. For example, a first
>>>> step would be to set up Bazel configurations to build the project's
>>>> thirdparty toolchain. Since we're reliant in ExternalProject in CMake
>>>> to do a lot of heavy lifting there for us, I imagine this (taking care
>>>> of what ThirdpartyToolchain.cmake does not) will take up a lot of the
>>>> energy
>>>> 
>>>> - Wes
>>>> 
>>>> On Sun, Oct 13, 2019 at 1:06 PM Micah Kornfield <emkornfi...@gmail.com>
>>>> wrote:
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> This might be taking the thread on more of a tangent, but maybe we
>>>> should
>>>>> start collecting requirements for the C++ build system in general and
>> see
>>>>> if there might be better solution that can address some of these
>>>> concerns?
>>>>> In particular, Bazel at least on the surface seems like it might be a
>>>>> better fit for some of the use cases discussed here.  I know this is a
>>>> big
>>>>> project (and I currently don't have much bandwidth for it) but I think
>> if
>>>>> CMake is lacking in these areas it might be worth at least exploring
>>>>> instead of going down the path of building our own meta-build system on
>>>> top
>>>>> of CMake.
>>>>> 
>>>>> Requirements that I think we are targeting:
>>>>> 1.  Be able to provide an out of box build system that requires as
>> close
>>>> to
>>>>> zero dependencies beyond a standard C++ toolchain (e.g. "$BUILD
>> minimal"
>>>>> works on any C++ developers desktop without additional requirements)
>>>>> 2.  The build system should limit configuration knobs in favor of
>> implied
>>>>> dependencies (e.g. "$BUILD python" automatically builds "compute",
>>>>> "filesystem", "ipc")
>>>>> 3.  The build system should be configurable to use (and have the user
>>>>> specify) one of "System packages", "Conda packages" or source packages
>>>> for
>>>>> providing dependencies (and fallback options between the three).
>>>>> 4.  The build system should be able to treat some dependencies as
>>>> optional
>>>>> (e.g. different compression libraries or allocators).
>>>>> 5.  Easily allow developers to limit building unnecessary code for
>> their
>>>>> particular task at hand.
>>>>> 6.  The build system must work across the following
>> toolchains/platforms:
>>>>>     - Linux:  g++ and clang.  x86 and ARM
>>>>>     - Mac
>>>>>     - Windows (msys2 and MSVC)
>>>>> 
>>>>> Thanks,
>>>>> Micah
>>>>> 
>>>>> 
>>>>> 
>>>>> On Thu, Oct 10, 2019 at 6:09 AM Antoine Pitrou <anto...@python.org>
>>>> wrote:
>>>>> 
>>>>>> 
>>>>>> Yes, we could express dependencies in a Python script and have it
>>>>>> generate a CMake module of if/else chains in cmake_modules (which we
>>>>>> would check in git to avoid having people depend on a Python install,
>>>>>> perhaps).
>>>>>> 
>>>>>> Still, that is an additional maintenance burden.
>>>>>> 
>>>>>> Regards
>>>>>> 
>>>>>> Antoine.
>>>>>> 
>>>>>> 
>>>>>> Le 10/10/2019 à 14:50, Wes McKinney a écrit :
>>>>>>> I guess one question we should first discuss is: who is the C++ build
>>>>>>> system for?
>>>>>>> 
>>>>>>> The users who are most sensitive to benchmark-driven decision making
>>>>>>> will generally be consuming the project through pre-built binaries,
>>>>>>> like our Python or R packages. If C++ developers build the project
>>>>>>> from source and don't do a minimal read of the documentation to see
>>>>>>> what a "recommended configuration" looks like, I would say that is
>>>>>>> more their fault than ours. In the case of the ARROW_JEMALLOC option,
>>>>>>> I think it's important for C++ system integrators to be aware of the
>>>>>>> impact of the choice of memory allocator.
>>>>>>> 
>>>>>>> The concern I have with the current "out of the box" experience is
>>>>>>> that people are getting the impression that "I have to build $X, $Y,
>>>>>>> and $Z -- which I don't necessarily need -- to have $CORE_FEATURE_1".
>>>>>>> They can, of course, read the documentation and learn that those
>>>>>>> things can be toggled off, but I think the user that reaches for a
>>>>>>> self-built source install is much different in general than someone
>>>>>>> who uses the project through the Linux binary packages, for example.
>>>>>>> 
>>>>>>> On the subject of managing intraproject dependencies and
>>>>>>> relationships, I think we should develop a better way to express
>>>>>>> relationships between components than we have now.
>>>>>>> 
>>>>>>> As an example, building the Python library assumes that various
>>>>>>> components are enabled
>>>>>>> 
>>>>>>> - ARROW_COMPUTE=ON
>>>>>>> - ARROW_FILESYSTEM=ON
>>>>>>> - ARROW_IPC=ON
>>>>>>> 
>>>>>>> Somewhere in the code we might have some code like
>>>>>>> 
>>>>>>> if (ARROW_PYTHON)
>>>>>>>   set(ARROW_COMPUTE ON)
>>>>>>>   ...
>>>>>>> endif()
>>>>>>> 
>>>>>>> This doesn't strike me as that scalable. I would rather see a
>>>>>>> dependency file like
>>>>>>> 
>>>>>>> component_dependencies = {
>>>>>>>     ...
>>>>>>>     'python': ['compute', 'filesystem', 'ipc'],
>>>>>>>     ...
>>>>>>> }
>>>>>>> 
>>>>>>> A helper Python script as part of the build could be used to give
>>>>>>> CMake (because CMake is a bit poor as a programming language) the
>>>> list
>>>>>>> of required components based on what the user has indicated to CMake.
>>>>>>> 
>>>>>>> On Thu, Oct 10, 2019 at 7:36 AM Francois Saint-Jacques
>>>>>>> <fsaintjacq...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> There's always the route of vendoring some library and not exposing
>>>>>>>> external CMake options. This would achieve the goal of
>>>>>>>> compile-out-of-the-box and enable important feature in the basic
>>>>>>>> build. We also simplify dependencies requirements (benefits CI or
>>>>>>>> developer). The downside is following security patches and grumpy
>>>>>>>> reaction from package maintainers. I think we should explore this
>>>>>>>> route for dependencies that match the following criteria:
>>>>>>>> 
>>>>>>>> - libarrow*.so don't export any of the symbols of the dependency and
>>>>>>>> not referenced in any public headers
>>>>>>>> - dependency is lightweight, e.g. excludes boost, openssl, grpc,
>>>> llvm,
>>>>>>>> thrift, protobuf
>>>>>>>> - dependency is not-ubiquitous on major platform and have a stable
>>>>>>>> API, e.g. excludes libz and openssl
>>>>>>>> 
>>>>>>>> A small list of candidates:
>>>>>>>> - RapidJSON (enables JSON)
>>>>>>>> - DoubleConversion (enables CSV)
>>>>>>>> 
>>>>>>>> There's a precedent, arrow already vendors small C++ libraries
>>>>>>>> (datetime, utf8cpp, variant, xxhash).
>>>>>>>> 
>>>>>>>> François
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Thu, Oct 10, 2019 at 6:03 AM Antoine Pitrou <anto...@python.org>
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Hi all,
>>>>>>>>> 
>>>>>>>>> I'm a bit concerned that we're planning to add many additional
>>>> build
>>>>>>>>> options in the quest to have a core zero-dependency build in C++.
>>>>>>>>> See for example https://issues.apache.org/jira/browse/ARROW-6633
>>>> or
>>>>>>>>> https://issues.apache.org/jira/browse/ARROW-6612.
>>>>>>>>> 
>>>>>>>>> The problem is that this is creating many possible configurations
>>>> and
>>>>>> we
>>>>>>>>> will only be testing a tiny subset of them.  Inevitably, users
>>>> will try
>>>>>>>>> other option combinations and they'll fail building for some random
>>>>>>>>> reason.  It will not be a very good user experience.
>>>>>>>>> 
>>>>>>>>> Another related issue is user perception when doing a default
>>>> build.
>>>>>>>>> For example https://issues.apache.org/jira/browse/ARROW-6638
>>>> proposes
>>>>>> to
>>>>>>>>> build with jemalloc disabled by default.  Inevitably, people will
>>>> be
>>>>>>>>> doing benchmarks with this (publicly or not) and they'll conclude
>>>> Arrow
>>>>>>>>> is not as performant as it claims to be.
>>>>>>>>> 
>>>>>>>>> Perhaps we should look for another approach instead?
>>>>>>>>> 
>>>>>>>>> For example we could have a single ARROW_BARE_CORE (whatever the
>>>> name)
>>>>>>>>> option that when enabled (not by default) builds the tiniest
>>>> minimal
>>>>>>>>> subset of Arrow.  It's more inflexible, but at least it's something
>>>>>> that
>>>>>>>>> we can reasonably test.
>>>>>>>>> 
>>>>>>>>> Regards
>>>>>>>>> 
>>>>>>>>> Antoine.
>>>>>> 
>>>> 
>>> 
>> 

Reply via email to