> > Perhaps meson is also worth exploring?
It could be, if someone else wants to take a look we can, compare what things look at in each. Recently, Bazel build rules seem like they would be useful for some work projects I've been dealing with, so I plan on focusing my exploration there. On Wed, Oct 16, 2019 at 6:27 AM Antoine Pitrou <anto...@python.org> wrote: > > Perhaps meson is also worth exploring? > > > Le 15/10/2019 à 23:06, Micah Kornfield a écrit : > > Hi Wes, > > I agree on both accounts that it won't be a done in the short term, and > it > > makes sense to tackle in incrementally. Like I said I don't have much > > bandwidth at the moment but might be able to re-arrange a few things on > my > > plate. I think some people have asked on the mailing list how they might > > be able to help, this might be one area that doesn't require a lot of > > in-depth knowledge of C++ at least for a proof of concept. I'll try to > > open up some JIRAs soon. > > > > Thanks, > > Micah > > > > On Tue, Oct 15, 2019 at 10:33 AM Wes McKinney <wesmck...@gmail.com> > wrote: > > > >> hi Micah, > >> > >> Definitely Bazel is worth exploring, but we must be realistic about > >> the amount of energy (several hundred hours or more) that's been > >> invested in the build system we have now. So a new build system will > >> be a large endeavor, but hopefully can make things simpler. > >> > >> Aside from the requirements gathering process, if it is felt that > >> Bazel is a possible path forward in the future, it may be good to try > >> to break up the work into more tractable pieces. For example, a first > >> step would be to set up Bazel configurations to build the project's > >> thirdparty toolchain. Since we're reliant in ExternalProject in CMake > >> to do a lot of heavy lifting there for us, I imagine this (taking care > >> of what ThirdpartyToolchain.cmake does not) will take up a lot of the > >> energy > >> > >> - Wes > >> > >> On Sun, Oct 13, 2019 at 1:06 PM Micah Kornfield <emkornfi...@gmail.com> > >> wrote: > >>> > >>>> > >>>> > >>>> This might be taking the thread on more of a tangent, but maybe we > >> should > >>> start collecting requirements for the C++ build system in general and > see > >>> if there might be better solution that can address some of these > >> concerns? > >>> In particular, Bazel at least on the surface seems like it might be a > >>> better fit for some of the use cases discussed here. I know this is a > >> big > >>> project (and I currently don't have much bandwidth for it) but I think > if > >>> CMake is lacking in these areas it might be worth at least exploring > >>> instead of going down the path of building our own meta-build system on > >> top > >>> of CMake. > >>> > >>> Requirements that I think we are targeting: > >>> 1. Be able to provide an out of box build system that requires as > close > >> to > >>> zero dependencies beyond a standard C++ toolchain (e.g. "$BUILD > minimal" > >>> works on any C++ developers desktop without additional requirements) > >>> 2. The build system should limit configuration knobs in favor of > implied > >>> dependencies (e.g. "$BUILD python" automatically builds "compute", > >>> "filesystem", "ipc") > >>> 3. The build system should be configurable to use (and have the user > >>> specify) one of "System packages", "Conda packages" or source packages > >> for > >>> providing dependencies (and fallback options between the three). > >>> 4. The build system should be able to treat some dependencies as > >> optional > >>> (e.g. different compression libraries or allocators). > >>> 5. Easily allow developers to limit building unnecessary code for > their > >>> particular task at hand. > >>> 6. The build system must work across the following > toolchains/platforms: > >>> - Linux: g++ and clang. x86 and ARM > >>> - Mac > >>> - Windows (msys2 and MSVC) > >>> > >>> Thanks, > >>> Micah > >>> > >>> > >>> > >>> On Thu, Oct 10, 2019 at 6:09 AM Antoine Pitrou <anto...@python.org> > >> wrote: > >>> > >>>> > >>>> Yes, we could express dependencies in a Python script and have it > >>>> generate a CMake module of if/else chains in cmake_modules (which we > >>>> would check in git to avoid having people depend on a Python install, > >>>> perhaps). > >>>> > >>>> Still, that is an additional maintenance burden. > >>>> > >>>> Regards > >>>> > >>>> Antoine. > >>>> > >>>> > >>>> Le 10/10/2019 à 14:50, Wes McKinney a écrit : > >>>>> I guess one question we should first discuss is: who is the C++ build > >>>>> system for? > >>>>> > >>>>> The users who are most sensitive to benchmark-driven decision making > >>>>> will generally be consuming the project through pre-built binaries, > >>>>> like our Python or R packages. If C++ developers build the project > >>>>> from source and don't do a minimal read of the documentation to see > >>>>> what a "recommended configuration" looks like, I would say that is > >>>>> more their fault than ours. In the case of the ARROW_JEMALLOC option, > >>>>> I think it's important for C++ system integrators to be aware of the > >>>>> impact of the choice of memory allocator. > >>>>> > >>>>> The concern I have with the current "out of the box" experience is > >>>>> that people are getting the impression that "I have to build $X, $Y, > >>>>> and $Z -- which I don't necessarily need -- to have $CORE_FEATURE_1". > >>>>> They can, of course, read the documentation and learn that those > >>>>> things can be toggled off, but I think the user that reaches for a > >>>>> self-built source install is much different in general than someone > >>>>> who uses the project through the Linux binary packages, for example. > >>>>> > >>>>> On the subject of managing intraproject dependencies and > >>>>> relationships, I think we should develop a better way to express > >>>>> relationships between components than we have now. > >>>>> > >>>>> As an example, building the Python library assumes that various > >>>>> components are enabled > >>>>> > >>>>> - ARROW_COMPUTE=ON > >>>>> - ARROW_FILESYSTEM=ON > >>>>> - ARROW_IPC=ON > >>>>> > >>>>> Somewhere in the code we might have some code like > >>>>> > >>>>> if (ARROW_PYTHON) > >>>>> set(ARROW_COMPUTE ON) > >>>>> ... > >>>>> endif() > >>>>> > >>>>> This doesn't strike me as that scalable. I would rather see a > >>>>> dependency file like > >>>>> > >>>>> component_dependencies = { > >>>>> ... > >>>>> 'python': ['compute', 'filesystem', 'ipc'], > >>>>> ... > >>>>> } > >>>>> > >>>>> A helper Python script as part of the build could be used to give > >>>>> CMake (because CMake is a bit poor as a programming language) the > >> list > >>>>> of required components based on what the user has indicated to CMake. > >>>>> > >>>>> On Thu, Oct 10, 2019 at 7:36 AM Francois Saint-Jacques > >>>>> <fsaintjacq...@gmail.com> wrote: > >>>>>> > >>>>>> There's always the route of vendoring some library and not exposing > >>>>>> external CMake options. This would achieve the goal of > >>>>>> compile-out-of-the-box and enable important feature in the basic > >>>>>> build. We also simplify dependencies requirements (benefits CI or > >>>>>> developer). The downside is following security patches and grumpy > >>>>>> reaction from package maintainers. I think we should explore this > >>>>>> route for dependencies that match the following criteria: > >>>>>> > >>>>>> - libarrow*.so don't export any of the symbols of the dependency and > >>>>>> not referenced in any public headers > >>>>>> - dependency is lightweight, e.g. excludes boost, openssl, grpc, > >> llvm, > >>>>>> thrift, protobuf > >>>>>> - dependency is not-ubiquitous on major platform and have a stable > >>>>>> API, e.g. excludes libz and openssl > >>>>>> > >>>>>> A small list of candidates: > >>>>>> - RapidJSON (enables JSON) > >>>>>> - DoubleConversion (enables CSV) > >>>>>> > >>>>>> There's a precedent, arrow already vendors small C++ libraries > >>>>>> (datetime, utf8cpp, variant, xxhash). > >>>>>> > >>>>>> François > >>>>>> > >>>>>> > >>>>>> On Thu, Oct 10, 2019 at 6:03 AM Antoine Pitrou <anto...@python.org> > >>>> wrote: > >>>>>>> > >>>>>>> > >>>>>>> Hi all, > >>>>>>> > >>>>>>> I'm a bit concerned that we're planning to add many additional > >> build > >>>>>>> options in the quest to have a core zero-dependency build in C++. > >>>>>>> See for example https://issues.apache.org/jira/browse/ARROW-6633 > >> or > >>>>>>> https://issues.apache.org/jira/browse/ARROW-6612. > >>>>>>> > >>>>>>> The problem is that this is creating many possible configurations > >> and > >>>> we > >>>>>>> will only be testing a tiny subset of them. Inevitably, users > >> will try > >>>>>>> other option combinations and they'll fail building for some random > >>>>>>> reason. It will not be a very good user experience. > >>>>>>> > >>>>>>> Another related issue is user perception when doing a default > >> build. > >>>>>>> For example https://issues.apache.org/jira/browse/ARROW-6638 > >> proposes > >>>> to > >>>>>>> build with jemalloc disabled by default. Inevitably, people will > >> be > >>>>>>> doing benchmarks with this (publicly or not) and they'll conclude > >> Arrow > >>>>>>> is not as performant as it claims to be. > >>>>>>> > >>>>>>> Perhaps we should look for another approach instead? > >>>>>>> > >>>>>>> For example we could have a single ARROW_BARE_CORE (whatever the > >> name) > >>>>>>> option that when enabled (not by default) builds the tiniest > >> minimal > >>>>>>> subset of Arrow. It's more inflexible, but at least it's something > >>>> that > >>>>>>> we can reasonably test. > >>>>>>> > >>>>>>> Regards > >>>>>>> > >>>>>>> Antoine. > >>>> > >> > > >