On Sun, Oct 20, 2019 at 12:22 PM Maarten Ballintijn <maart...@xs4all.nl> wrote:
>
> Dev's
>
> I would request to be as conservative as possible in choosing (keeping) a 
> build system.
>
> For developers, packagers and even end-users for some languages the build 
> system is just
> another dependency. Even if cmake is not ideal, it has become quite 
> ubiquitous which is a huge plus.
>
> Maybe it is possible to come up with a way of expressing the dependency 
> relations in cmake in
> a way that makes maintaining them easier. Otherwise it is maybe possible to 
> generate them from
> a (simple) description file?

There do seem to be parts of our CMake build system that contain
boilerplate (particularly some of the platform-specific export
defines) that might be better auto-generated in some way, so this is
something it would be worth looking more at.

FWIW, some Google projects I have seen offer CMake as a build option
but the CMake files are mostly auto-generated from another build
configuration.

>
> Cheers,
> Maarten.
>
>
> > On Oct 19, 2019, at 11:22 PM, Micah Kornfield <emkornfi...@gmail.com> wrote:
> >
> >>
> >> Perhaps meson is also worth exploring?
> >
> >
> > It could be, if someone else wants to take a look we can, compare what
> > things look at in each. Recently, Bazel build rules seem like they would be
> > useful for some work projects I've been dealing with, so I plan on focusing
> > my exploration there.
> >
> > On Wed, Oct 16, 2019 at 6:27 AM Antoine Pitrou <anto...@python.org> wrote:
> >
> >>
> >> Perhaps meson is also worth exploring?
> >>
> >>
> >> Le 15/10/2019 à 23:06, Micah Kornfield a écrit :
> >>> Hi Wes,
> >>> I agree on both accounts that it won't be a done in the short term, and
> >> it
> >>> makes sense to tackle in incrementally.  Like I said I don't have much
> >>> bandwidth at the moment but might be able to re-arrange a few things on
> >> my
> >>> plate.  I think some people have asked on the mailing list how they might
> >>> be able to help, this might be one area that doesn't require a lot of
> >>> in-depth knowledge of C++ at least for a proof of concept.  I'll try to
> >>> open up some JIRAs soon.
> >>>
> >>> Thanks,
> >>> Micah
> >>>
> >>> On Tue, Oct 15, 2019 at 10:33 AM Wes McKinney <wesmck...@gmail.com>
> >> wrote:
> >>>
> >>>> hi Micah,
> >>>>
> >>>> Definitely Bazel is worth exploring, but we must be realistic about
> >>>> the amount of energy (several hundred hours or more) that's been
> >>>> invested in the build system we have now. So a new build system will
> >>>> be a large endeavor, but hopefully can make things simpler.
> >>>>
> >>>> Aside from the requirements gathering process, if it is felt that
> >>>> Bazel is a possible path forward in the future, it may be good to try
> >>>> to break up the work into more tractable pieces. For example, a first
> >>>> step would be to set up Bazel configurations to build the project's
> >>>> thirdparty toolchain. Since we're reliant in ExternalProject in CMake
> >>>> to do a lot of heavy lifting there for us, I imagine this (taking care
> >>>> of what ThirdpartyToolchain.cmake does not) will take up a lot of the
> >>>> energy
> >>>>
> >>>> - Wes
> >>>>
> >>>> On Sun, Oct 13, 2019 at 1:06 PM Micah Kornfield <emkornfi...@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>> This might be taking the thread on more of a tangent, but maybe we
> >>>> should
> >>>>> start collecting requirements for the C++ build system in general and
> >> see
> >>>>> if there might be better solution that can address some of these
> >>>> concerns?
> >>>>> In particular, Bazel at least on the surface seems like it might be a
> >>>>> better fit for some of the use cases discussed here.  I know this is a
> >>>> big
> >>>>> project (and I currently don't have much bandwidth for it) but I think
> >> if
> >>>>> CMake is lacking in these areas it might be worth at least exploring
> >>>>> instead of going down the path of building our own meta-build system on
> >>>> top
> >>>>> of CMake.
> >>>>>
> >>>>> Requirements that I think we are targeting:
> >>>>> 1.  Be able to provide an out of box build system that requires as
> >> close
> >>>> to
> >>>>> zero dependencies beyond a standard C++ toolchain (e.g. "$BUILD
> >> minimal"
> >>>>> works on any C++ developers desktop without additional requirements)
> >>>>> 2.  The build system should limit configuration knobs in favor of
> >> implied
> >>>>> dependencies (e.g. "$BUILD python" automatically builds "compute",
> >>>>> "filesystem", "ipc")
> >>>>> 3.  The build system should be configurable to use (and have the user
> >>>>> specify) one of "System packages", "Conda packages" or source packages
> >>>> for
> >>>>> providing dependencies (and fallback options between the three).
> >>>>> 4.  The build system should be able to treat some dependencies as
> >>>> optional
> >>>>> (e.g. different compression libraries or allocators).
> >>>>> 5.  Easily allow developers to limit building unnecessary code for
> >> their
> >>>>> particular task at hand.
> >>>>> 6.  The build system must work across the following
> >> toolchains/platforms:
> >>>>>     - Linux:  g++ and clang.  x86 and ARM
> >>>>>     - Mac
> >>>>>     - Windows (msys2 and MSVC)
> >>>>>
> >>>>> Thanks,
> >>>>> Micah
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Thu, Oct 10, 2019 at 6:09 AM Antoine Pitrou <anto...@python.org>
> >>>> wrote:
> >>>>>
> >>>>>>
> >>>>>> Yes, we could express dependencies in a Python script and have it
> >>>>>> generate a CMake module of if/else chains in cmake_modules (which we
> >>>>>> would check in git to avoid having people depend on a Python install,
> >>>>>> perhaps).
> >>>>>>
> >>>>>> Still, that is an additional maintenance burden.
> >>>>>>
> >>>>>> Regards
> >>>>>>
> >>>>>> Antoine.
> >>>>>>
> >>>>>>
> >>>>>> Le 10/10/2019 à 14:50, Wes McKinney a écrit :
> >>>>>>> I guess one question we should first discuss is: who is the C++ build
> >>>>>>> system for?
> >>>>>>>
> >>>>>>> The users who are most sensitive to benchmark-driven decision making
> >>>>>>> will generally be consuming the project through pre-built binaries,
> >>>>>>> like our Python or R packages. If C++ developers build the project
> >>>>>>> from source and don't do a minimal read of the documentation to see
> >>>>>>> what a "recommended configuration" looks like, I would say that is
> >>>>>>> more their fault than ours. In the case of the ARROW_JEMALLOC option,
> >>>>>>> I think it's important for C++ system integrators to be aware of the
> >>>>>>> impact of the choice of memory allocator.
> >>>>>>>
> >>>>>>> The concern I have with the current "out of the box" experience is
> >>>>>>> that people are getting the impression that "I have to build $X, $Y,
> >>>>>>> and $Z -- which I don't necessarily need -- to have $CORE_FEATURE_1".
> >>>>>>> They can, of course, read the documentation and learn that those
> >>>>>>> things can be toggled off, but I think the user that reaches for a
> >>>>>>> self-built source install is much different in general than someone
> >>>>>>> who uses the project through the Linux binary packages, for example.
> >>>>>>>
> >>>>>>> On the subject of managing intraproject dependencies and
> >>>>>>> relationships, I think we should develop a better way to express
> >>>>>>> relationships between components than we have now.
> >>>>>>>
> >>>>>>> As an example, building the Python library assumes that various
> >>>>>>> components are enabled
> >>>>>>>
> >>>>>>> - ARROW_COMPUTE=ON
> >>>>>>> - ARROW_FILESYSTEM=ON
> >>>>>>> - ARROW_IPC=ON
> >>>>>>>
> >>>>>>> Somewhere in the code we might have some code like
> >>>>>>>
> >>>>>>> if (ARROW_PYTHON)
> >>>>>>>   set(ARROW_COMPUTE ON)
> >>>>>>>   ...
> >>>>>>> endif()
> >>>>>>>
> >>>>>>> This doesn't strike me as that scalable. I would rather see a
> >>>>>>> dependency file like
> >>>>>>>
> >>>>>>> component_dependencies = {
> >>>>>>>     ...
> >>>>>>>     'python': ['compute', 'filesystem', 'ipc'],
> >>>>>>>     ...
> >>>>>>> }
> >>>>>>>
> >>>>>>> A helper Python script as part of the build could be used to give
> >>>>>>> CMake (because CMake is a bit poor as a programming language) the
> >>>> list
> >>>>>>> of required components based on what the user has indicated to CMake.
> >>>>>>>
> >>>>>>> On Thu, Oct 10, 2019 at 7:36 AM Francois Saint-Jacques
> >>>>>>> <fsaintjacq...@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>> There's always the route of vendoring some library and not exposing
> >>>>>>>> external CMake options. This would achieve the goal of
> >>>>>>>> compile-out-of-the-box and enable important feature in the basic
> >>>>>>>> build. We also simplify dependencies requirements (benefits CI or
> >>>>>>>> developer). The downside is following security patches and grumpy
> >>>>>>>> reaction from package maintainers. I think we should explore this
> >>>>>>>> route for dependencies that match the following criteria:
> >>>>>>>>
> >>>>>>>> - libarrow*.so don't export any of the symbols of the dependency and
> >>>>>>>> not referenced in any public headers
> >>>>>>>> - dependency is lightweight, e.g. excludes boost, openssl, grpc,
> >>>> llvm,
> >>>>>>>> thrift, protobuf
> >>>>>>>> - dependency is not-ubiquitous on major platform and have a stable
> >>>>>>>> API, e.g. excludes libz and openssl
> >>>>>>>>
> >>>>>>>> A small list of candidates:
> >>>>>>>> - RapidJSON (enables JSON)
> >>>>>>>> - DoubleConversion (enables CSV)
> >>>>>>>>
> >>>>>>>> There's a precedent, arrow already vendors small C++ libraries
> >>>>>>>> (datetime, utf8cpp, variant, xxhash).
> >>>>>>>>
> >>>>>>>> François
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Thu, Oct 10, 2019 at 6:03 AM Antoine Pitrou <anto...@python.org>
> >>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Hi all,
> >>>>>>>>>
> >>>>>>>>> I'm a bit concerned that we're planning to add many additional
> >>>> build
> >>>>>>>>> options in the quest to have a core zero-dependency build in C++.
> >>>>>>>>> See for example https://issues.apache.org/jira/browse/ARROW-6633
> >>>> or
> >>>>>>>>> https://issues.apache.org/jira/browse/ARROW-6612.
> >>>>>>>>>
> >>>>>>>>> The problem is that this is creating many possible configurations
> >>>> and
> >>>>>> we
> >>>>>>>>> will only be testing a tiny subset of them.  Inevitably, users
> >>>> will try
> >>>>>>>>> other option combinations and they'll fail building for some random
> >>>>>>>>> reason.  It will not be a very good user experience.
> >>>>>>>>>
> >>>>>>>>> Another related issue is user perception when doing a default
> >>>> build.
> >>>>>>>>> For example https://issues.apache.org/jira/browse/ARROW-6638
> >>>> proposes
> >>>>>> to
> >>>>>>>>> build with jemalloc disabled by default.  Inevitably, people will
> >>>> be
> >>>>>>>>> doing benchmarks with this (publicly or not) and they'll conclude
> >>>> Arrow
> >>>>>>>>> is not as performant as it claims to be.
> >>>>>>>>>
> >>>>>>>>> Perhaps we should look for another approach instead?
> >>>>>>>>>
> >>>>>>>>> For example we could have a single ARROW_BARE_CORE (whatever the
> >>>> name)
> >>>>>>>>> option that when enabled (not by default) builds the tiniest
> >>>> minimal
> >>>>>>>>> subset of Arrow.  It's more inflexible, but at least it's something
> >>>>>> that
> >>>>>>>>> we can reasonably test.
> >>>>>>>>>
> >>>>>>>>> Regards
> >>>>>>>>>
> >>>>>>>>> Antoine.
> >>>>>>
> >>>>
> >>>
> >>
>

Reply via email to