Re: [C++] The quest for zero-dependency builds

Micah Kornfield Sat, 19 Oct 2019 20:23:31 -0700

>
> Perhaps meson is also worth exploring?


It could be, if someone else wants to take a look we can, compare what
things look at in each. Recently, Bazel build rules seem like they would be
useful for some work projects I've been dealing with, so I plan on focusing
my exploration there.

On Wed, Oct 16, 2019 at 6:27 AM Antoine Pitrou <anto...@python.org> wrote:

>
> Perhaps meson is also worth exploring?
>
>
> Le 15/10/2019 à 23:06, Micah Kornfield a écrit :
> > Hi Wes,
> > I agree on both accounts that it won't be a done in the short term, and
> it
> > makes sense to tackle in incrementally.  Like I said I don't have much
> > bandwidth at the moment but might be able to re-arrange a few things on
> my
> > plate.  I think some people have asked on the mailing list how they might
> > be able to help, this might be one area that doesn't require a lot of
> > in-depth knowledge of C++ at least for a proof of concept.  I'll try to
> > open up some JIRAs soon.
> >
> > Thanks,
> > Micah
> >
> > On Tue, Oct 15, 2019 at 10:33 AM Wes McKinney <wesmck...@gmail.com>
> wrote:
> >
> >> hi Micah,
> >>
> >> Definitely Bazel is worth exploring, but we must be realistic about
> >> the amount of energy (several hundred hours or more) that's been
> >> invested in the build system we have now. So a new build system will
> >> be a large endeavor, but hopefully can make things simpler.
> >>
> >> Aside from the requirements gathering process, if it is felt that
> >> Bazel is a possible path forward in the future, it may be good to try
> >> to break up the work into more tractable pieces. For example, a first
> >> step would be to set up Bazel configurations to build the project's
> >> thirdparty toolchain. Since we're reliant in ExternalProject in CMake
> >> to do a lot of heavy lifting there for us, I imagine this (taking care
> >> of what ThirdpartyToolchain.cmake does not) will take up a lot of the
> >> energy
> >>
> >> - Wes
> >>
> >> On Sun, Oct 13, 2019 at 1:06 PM Micah Kornfield <emkornfi...@gmail.com>
> >> wrote:
> >>>
> >>>>
> >>>>
> >>>> This might be taking the thread on more of a tangent, but maybe we
> >> should
> >>> start collecting requirements for the C++ build system in general and
> see
> >>> if there might be better solution that can address some of these
> >> concerns?
> >>> In particular, Bazel at least on the surface seems like it might be a
> >>> better fit for some of the use cases discussed here.  I know this is a
> >> big
> >>> project (and I currently don't have much bandwidth for it) but I think
> if
> >>> CMake is lacking in these areas it might be worth at least exploring
> >>> instead of going down the path of building our own meta-build system on
> >> top
> >>> of CMake.
> >>>
> >>> Requirements that I think we are targeting:
> >>> 1.  Be able to provide an out of box build system that requires as
> close
> >> to
> >>> zero dependencies beyond a standard C++ toolchain (e.g. "$BUILD
> minimal"
> >>> works on any C++ developers desktop without additional requirements)
> >>> 2.  The build system should limit configuration knobs in favor of
> implied
> >>> dependencies (e.g. "$BUILD python" automatically builds "compute",
> >>> "filesystem", "ipc")
> >>> 3.  The build system should be configurable to use (and have the user
> >>> specify) one of "System packages", "Conda packages" or source packages
> >> for
> >>> providing dependencies (and fallback options between the three).
> >>> 4.  The build system should be able to treat some dependencies as
> >> optional
> >>> (e.g. different compression libraries or allocators).
> >>> 5.  Easily allow developers to limit building unnecessary code for
> their
> >>> particular task at hand.
> >>> 6.  The build system must work across the following
> toolchains/platforms:
> >>>      - Linux:  g++ and clang.  x86 and ARM
> >>>      - Mac
> >>>      - Windows (msys2 and MSVC)
> >>>
> >>> Thanks,
> >>> Micah
> >>>
> >>>
> >>>
> >>> On Thu, Oct 10, 2019 at 6:09 AM Antoine Pitrou <anto...@python.org>
> >> wrote:
> >>>
> >>>>
> >>>> Yes, we could express dependencies in a Python script and have it
> >>>> generate a CMake module of if/else chains in cmake_modules (which we
> >>>> would check in git to avoid having people depend on a Python install,
> >>>> perhaps).
> >>>>
> >>>> Still, that is an additional maintenance burden.
> >>>>
> >>>> Regards
> >>>>
> >>>> Antoine.
> >>>>
> >>>>
> >>>> Le 10/10/2019 à 14:50, Wes McKinney a écrit :
> >>>>> I guess one question we should first discuss is: who is the C++ build
> >>>>> system for?
> >>>>>
> >>>>> The users who are most sensitive to benchmark-driven decision making
> >>>>> will generally be consuming the project through pre-built binaries,
> >>>>> like our Python or R packages. If C++ developers build the project
> >>>>> from source and don't do a minimal read of the documentation to see
> >>>>> what a "recommended configuration" looks like, I would say that is
> >>>>> more their fault than ours. In the case of the ARROW_JEMALLOC option,
> >>>>> I think it's important for C++ system integrators to be aware of the
> >>>>> impact of the choice of memory allocator.
> >>>>>
> >>>>> The concern I have with the current "out of the box" experience is
> >>>>> that people are getting the impression that "I have to build $X, $Y,
> >>>>> and $Z -- which I don't necessarily need -- to have $CORE_FEATURE_1".
> >>>>> They can, of course, read the documentation and learn that those
> >>>>> things can be toggled off, but I think the user that reaches for a
> >>>>> self-built source install is much different in general than someone
> >>>>> who uses the project through the Linux binary packages, for example.
> >>>>>
> >>>>> On the subject of managing intraproject dependencies and
> >>>>> relationships, I think we should develop a better way to express
> >>>>> relationships between components than we have now.
> >>>>>
> >>>>> As an example, building the Python library assumes that various
> >>>>> components are enabled
> >>>>>
> >>>>> - ARROW_COMPUTE=ON
> >>>>> - ARROW_FILESYSTEM=ON
> >>>>> - ARROW_IPC=ON
> >>>>>
> >>>>> Somewhere in the code we might have some code like
> >>>>>
> >>>>> if (ARROW_PYTHON)
> >>>>>    set(ARROW_COMPUTE ON)
> >>>>>    ...
> >>>>> endif()
> >>>>>
> >>>>> This doesn't strike me as that scalable. I would rather see a
> >>>>> dependency file like
> >>>>>
> >>>>> component_dependencies = {
> >>>>>      ...
> >>>>>      'python': ['compute', 'filesystem', 'ipc'],
> >>>>>      ...
> >>>>> }
> >>>>>
> >>>>> A helper Python script as part of the build could be used to give
> >>>>> CMake (because CMake is a bit poor as a programming language) the
> >> list
> >>>>> of required components based on what the user has indicated to CMake.
> >>>>>
> >>>>> On Thu, Oct 10, 2019 at 7:36 AM Francois Saint-Jacques
> >>>>> <fsaintjacq...@gmail.com> wrote:
> >>>>>>
> >>>>>> There's always the route of vendoring some library and not exposing
> >>>>>> external CMake options. This would achieve the goal of
> >>>>>> compile-out-of-the-box and enable important feature in the basic
> >>>>>> build. We also simplify dependencies requirements (benefits CI or
> >>>>>> developer). The downside is following security patches and grumpy
> >>>>>> reaction from package maintainers. I think we should explore this
> >>>>>> route for dependencies that match the following criteria:
> >>>>>>
> >>>>>> - libarrow*.so don't export any of the symbols of the dependency and
> >>>>>> not referenced in any public headers
> >>>>>> - dependency is lightweight, e.g. excludes boost, openssl, grpc,
> >> llvm,
> >>>>>> thrift, protobuf
> >>>>>> - dependency is not-ubiquitous on major platform and have a stable
> >>>>>> API, e.g. excludes libz and openssl
> >>>>>>
> >>>>>> A small list of candidates:
> >>>>>> - RapidJSON (enables JSON)
> >>>>>> - DoubleConversion (enables CSV)
> >>>>>>
> >>>>>> There's a precedent, arrow already vendors small C++ libraries
> >>>>>> (datetime, utf8cpp, variant, xxhash).
> >>>>>>
> >>>>>> François
> >>>>>>
> >>>>>>
> >>>>>> On Thu, Oct 10, 2019 at 6:03 AM Antoine Pitrou <anto...@python.org>
> >>>> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> I'm a bit concerned that we're planning to add many additional
> >> build
> >>>>>>> options in the quest to have a core zero-dependency build in C++.
> >>>>>>> See for example https://issues.apache.org/jira/browse/ARROW-6633
> >> or
> >>>>>>> https://issues.apache.org/jira/browse/ARROW-6612.
> >>>>>>>
> >>>>>>> The problem is that this is creating many possible configurations
> >> and
> >>>> we
> >>>>>>> will only be testing a tiny subset of them.  Inevitably, users
> >> will try
> >>>>>>> other option combinations and they'll fail building for some random
> >>>>>>> reason.  It will not be a very good user experience.
> >>>>>>>
> >>>>>>> Another related issue is user perception when doing a default
> >> build.
> >>>>>>> For example https://issues.apache.org/jira/browse/ARROW-6638
> >> proposes
> >>>> to
> >>>>>>> build with jemalloc disabled by default.  Inevitably, people will
> >> be
> >>>>>>> doing benchmarks with this (publicly or not) and they'll conclude
> >> Arrow
> >>>>>>> is not as performant as it claims to be.
> >>>>>>>
> >>>>>>> Perhaps we should look for another approach instead?
> >>>>>>>
> >>>>>>> For example we could have a single ARROW_BARE_CORE (whatever the
> >> name)
> >>>>>>> option that when enabled (not by default) builds the tiniest
> >> minimal
> >>>>>>> subset of Arrow.  It's more inflexible, but at least it's something
> >>>> that
> >>>>>>> we can reasonably test.
> >>>>>>>
> >>>>>>> Regards
> >>>>>>>
> >>>>>>> Antoine.
> >>>>
> >>
> >
>

Re: [C++] The quest for zero-dependency builds

Reply via email to