On Sun, Oct 20, 2019 at 12:22 PM Maarten Ballintijn <maart...@xs4all.nl> wrote: > > Dev's > > I would request to be as conservative as possible in choosing (keeping) a > build system. > > For developers, packagers and even end-users for some languages the build > system is just > another dependency. Even if cmake is not ideal, it has become quite > ubiquitous which is a huge plus. > > Maybe it is possible to come up with a way of expressing the dependency > relations in cmake in > a way that makes maintaining them easier. Otherwise it is maybe possible to > generate them from > a (simple) description file?
There do seem to be parts of our CMake build system that contain boilerplate (particularly some of the platform-specific export defines) that might be better auto-generated in some way, so this is something it would be worth looking more at. FWIW, some Google projects I have seen offer CMake as a build option but the CMake files are mostly auto-generated from another build configuration. > > Cheers, > Maarten. > > > > On Oct 19, 2019, at 11:22 PM, Micah Kornfield <emkornfi...@gmail.com> wrote: > > > >> > >> Perhaps meson is also worth exploring? > > > > > > It could be, if someone else wants to take a look we can, compare what > > things look at in each. Recently, Bazel build rules seem like they would be > > useful for some work projects I've been dealing with, so I plan on focusing > > my exploration there. > > > > On Wed, Oct 16, 2019 at 6:27 AM Antoine Pitrou <anto...@python.org> wrote: > > > >> > >> Perhaps meson is also worth exploring? > >> > >> > >> Le 15/10/2019 à 23:06, Micah Kornfield a écrit : > >>> Hi Wes, > >>> I agree on both accounts that it won't be a done in the short term, and > >> it > >>> makes sense to tackle in incrementally. Like I said I don't have much > >>> bandwidth at the moment but might be able to re-arrange a few things on > >> my > >>> plate. I think some people have asked on the mailing list how they might > >>> be able to help, this might be one area that doesn't require a lot of > >>> in-depth knowledge of C++ at least for a proof of concept. I'll try to > >>> open up some JIRAs soon. > >>> > >>> Thanks, > >>> Micah > >>> > >>> On Tue, Oct 15, 2019 at 10:33 AM Wes McKinney <wesmck...@gmail.com> > >> wrote: > >>> > >>>> hi Micah, > >>>> > >>>> Definitely Bazel is worth exploring, but we must be realistic about > >>>> the amount of energy (several hundred hours or more) that's been > >>>> invested in the build system we have now. So a new build system will > >>>> be a large endeavor, but hopefully can make things simpler. > >>>> > >>>> Aside from the requirements gathering process, if it is felt that > >>>> Bazel is a possible path forward in the future, it may be good to try > >>>> to break up the work into more tractable pieces. For example, a first > >>>> step would be to set up Bazel configurations to build the project's > >>>> thirdparty toolchain. Since we're reliant in ExternalProject in CMake > >>>> to do a lot of heavy lifting there for us, I imagine this (taking care > >>>> of what ThirdpartyToolchain.cmake does not) will take up a lot of the > >>>> energy > >>>> > >>>> - Wes > >>>> > >>>> On Sun, Oct 13, 2019 at 1:06 PM Micah Kornfield <emkornfi...@gmail.com> > >>>> wrote: > >>>>> > >>>>>> > >>>>>> > >>>>>> This might be taking the thread on more of a tangent, but maybe we > >>>> should > >>>>> start collecting requirements for the C++ build system in general and > >> see > >>>>> if there might be better solution that can address some of these > >>>> concerns? > >>>>> In particular, Bazel at least on the surface seems like it might be a > >>>>> better fit for some of the use cases discussed here. I know this is a > >>>> big > >>>>> project (and I currently don't have much bandwidth for it) but I think > >> if > >>>>> CMake is lacking in these areas it might be worth at least exploring > >>>>> instead of going down the path of building our own meta-build system on > >>>> top > >>>>> of CMake. > >>>>> > >>>>> Requirements that I think we are targeting: > >>>>> 1. Be able to provide an out of box build system that requires as > >> close > >>>> to > >>>>> zero dependencies beyond a standard C++ toolchain (e.g. "$BUILD > >> minimal" > >>>>> works on any C++ developers desktop without additional requirements) > >>>>> 2. The build system should limit configuration knobs in favor of > >> implied > >>>>> dependencies (e.g. "$BUILD python" automatically builds "compute", > >>>>> "filesystem", "ipc") > >>>>> 3. The build system should be configurable to use (and have the user > >>>>> specify) one of "System packages", "Conda packages" or source packages > >>>> for > >>>>> providing dependencies (and fallback options between the three). > >>>>> 4. The build system should be able to treat some dependencies as > >>>> optional > >>>>> (e.g. different compression libraries or allocators). > >>>>> 5. Easily allow developers to limit building unnecessary code for > >> their > >>>>> particular task at hand. > >>>>> 6. The build system must work across the following > >> toolchains/platforms: > >>>>> - Linux: g++ and clang. x86 and ARM > >>>>> - Mac > >>>>> - Windows (msys2 and MSVC) > >>>>> > >>>>> Thanks, > >>>>> Micah > >>>>> > >>>>> > >>>>> > >>>>> On Thu, Oct 10, 2019 at 6:09 AM Antoine Pitrou <anto...@python.org> > >>>> wrote: > >>>>> > >>>>>> > >>>>>> Yes, we could express dependencies in a Python script and have it > >>>>>> generate a CMake module of if/else chains in cmake_modules (which we > >>>>>> would check in git to avoid having people depend on a Python install, > >>>>>> perhaps). > >>>>>> > >>>>>> Still, that is an additional maintenance burden. > >>>>>> > >>>>>> Regards > >>>>>> > >>>>>> Antoine. > >>>>>> > >>>>>> > >>>>>> Le 10/10/2019 à 14:50, Wes McKinney a écrit : > >>>>>>> I guess one question we should first discuss is: who is the C++ build > >>>>>>> system for? > >>>>>>> > >>>>>>> The users who are most sensitive to benchmark-driven decision making > >>>>>>> will generally be consuming the project through pre-built binaries, > >>>>>>> like our Python or R packages. If C++ developers build the project > >>>>>>> from source and don't do a minimal read of the documentation to see > >>>>>>> what a "recommended configuration" looks like, I would say that is > >>>>>>> more their fault than ours. In the case of the ARROW_JEMALLOC option, > >>>>>>> I think it's important for C++ system integrators to be aware of the > >>>>>>> impact of the choice of memory allocator. > >>>>>>> > >>>>>>> The concern I have with the current "out of the box" experience is > >>>>>>> that people are getting the impression that "I have to build $X, $Y, > >>>>>>> and $Z -- which I don't necessarily need -- to have $CORE_FEATURE_1". > >>>>>>> They can, of course, read the documentation and learn that those > >>>>>>> things can be toggled off, but I think the user that reaches for a > >>>>>>> self-built source install is much different in general than someone > >>>>>>> who uses the project through the Linux binary packages, for example. > >>>>>>> > >>>>>>> On the subject of managing intraproject dependencies and > >>>>>>> relationships, I think we should develop a better way to express > >>>>>>> relationships between components than we have now. > >>>>>>> > >>>>>>> As an example, building the Python library assumes that various > >>>>>>> components are enabled > >>>>>>> > >>>>>>> - ARROW_COMPUTE=ON > >>>>>>> - ARROW_FILESYSTEM=ON > >>>>>>> - ARROW_IPC=ON > >>>>>>> > >>>>>>> Somewhere in the code we might have some code like > >>>>>>> > >>>>>>> if (ARROW_PYTHON) > >>>>>>> set(ARROW_COMPUTE ON) > >>>>>>> ... > >>>>>>> endif() > >>>>>>> > >>>>>>> This doesn't strike me as that scalable. I would rather see a > >>>>>>> dependency file like > >>>>>>> > >>>>>>> component_dependencies = { > >>>>>>> ... > >>>>>>> 'python': ['compute', 'filesystem', 'ipc'], > >>>>>>> ... > >>>>>>> } > >>>>>>> > >>>>>>> A helper Python script as part of the build could be used to give > >>>>>>> CMake (because CMake is a bit poor as a programming language) the > >>>> list > >>>>>>> of required components based on what the user has indicated to CMake. > >>>>>>> > >>>>>>> On Thu, Oct 10, 2019 at 7:36 AM Francois Saint-Jacques > >>>>>>> <fsaintjacq...@gmail.com> wrote: > >>>>>>>> > >>>>>>>> There's always the route of vendoring some library and not exposing > >>>>>>>> external CMake options. This would achieve the goal of > >>>>>>>> compile-out-of-the-box and enable important feature in the basic > >>>>>>>> build. We also simplify dependencies requirements (benefits CI or > >>>>>>>> developer). The downside is following security patches and grumpy > >>>>>>>> reaction from package maintainers. I think we should explore this > >>>>>>>> route for dependencies that match the following criteria: > >>>>>>>> > >>>>>>>> - libarrow*.so don't export any of the symbols of the dependency and > >>>>>>>> not referenced in any public headers > >>>>>>>> - dependency is lightweight, e.g. excludes boost, openssl, grpc, > >>>> llvm, > >>>>>>>> thrift, protobuf > >>>>>>>> - dependency is not-ubiquitous on major platform and have a stable > >>>>>>>> API, e.g. excludes libz and openssl > >>>>>>>> > >>>>>>>> A small list of candidates: > >>>>>>>> - RapidJSON (enables JSON) > >>>>>>>> - DoubleConversion (enables CSV) > >>>>>>>> > >>>>>>>> There's a precedent, arrow already vendors small C++ libraries > >>>>>>>> (datetime, utf8cpp, variant, xxhash). > >>>>>>>> > >>>>>>>> François > >>>>>>>> > >>>>>>>> > >>>>>>>> On Thu, Oct 10, 2019 at 6:03 AM Antoine Pitrou <anto...@python.org> > >>>>>> wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Hi all, > >>>>>>>>> > >>>>>>>>> I'm a bit concerned that we're planning to add many additional > >>>> build > >>>>>>>>> options in the quest to have a core zero-dependency build in C++. > >>>>>>>>> See for example https://issues.apache.org/jira/browse/ARROW-6633 > >>>> or > >>>>>>>>> https://issues.apache.org/jira/browse/ARROW-6612. > >>>>>>>>> > >>>>>>>>> The problem is that this is creating many possible configurations > >>>> and > >>>>>> we > >>>>>>>>> will only be testing a tiny subset of them. Inevitably, users > >>>> will try > >>>>>>>>> other option combinations and they'll fail building for some random > >>>>>>>>> reason. It will not be a very good user experience. > >>>>>>>>> > >>>>>>>>> Another related issue is user perception when doing a default > >>>> build. > >>>>>>>>> For example https://issues.apache.org/jira/browse/ARROW-6638 > >>>> proposes > >>>>>> to > >>>>>>>>> build with jemalloc disabled by default. Inevitably, people will > >>>> be > >>>>>>>>> doing benchmarks with this (publicly or not) and they'll conclude > >>>> Arrow > >>>>>>>>> is not as performant as it claims to be. > >>>>>>>>> > >>>>>>>>> Perhaps we should look for another approach instead? > >>>>>>>>> > >>>>>>>>> For example we could have a single ARROW_BARE_CORE (whatever the > >>>> name) > >>>>>>>>> option that when enabled (not by default) builds the tiniest > >>>> minimal > >>>>>>>>> subset of Arrow. It's more inflexible, but at least it's something > >>>>>> that > >>>>>>>>> we can reasonably test. > >>>>>>>>> > >>>>>>>>> Regards > >>>>>>>>> > >>>>>>>>> Antoine. > >>>>>> > >>>> > >>> > >> >