I'll add I don't think we will actually be switching anytime soon. bazel does have some advantages at least over our current CMake system in terms of developer productivity (users can target smaller components with unit tests which avoid re linking). I've started on a prototype and hope to have something to share in the next few days, so we can evaluate if it is reasonable to have the two live side-by-side in the short term.
On Wed, Oct 23, 2019 at 4:11 PM Wes McKinney <wesmck...@gmail.com> wrote: > On Sun, Oct 20, 2019 at 12:22 PM Maarten Ballintijn <maart...@xs4all.nl> > wrote: > > > > Dev's > > > > I would request to be as conservative as possible in choosing (keeping) > a build system. > > > > For developers, packagers and even end-users for some languages the > build system is just > > another dependency. Even if cmake is not ideal, it has become quite > ubiquitous which is a huge plus. > > > > Maybe it is possible to come up with a way of expressing the dependency > relations in cmake in > > a way that makes maintaining them easier. Otherwise it is maybe possible > to generate them from > > a (simple) description file? > > There do seem to be parts of our CMake build system that contain > boilerplate (particularly some of the platform-specific export > defines) that might be better auto-generated in some way, so this is > something it would be worth looking more at. > > FWIW, some Google projects I have seen offer CMake as a build option > but the CMake files are mostly auto-generated from another build > configuration. > > > > > Cheers, > > Maarten. > > > > > > > On Oct 19, 2019, at 11:22 PM, Micah Kornfield <emkornfi...@gmail.com> > wrote: > > > > > >> > > >> Perhaps meson is also worth exploring? > > > > > > > > > It could be, if someone else wants to take a look we can, compare what > > > things look at in each. Recently, Bazel build rules seem like they > would be > > > useful for some work projects I've been dealing with, so I plan on > focusing > > > my exploration there. > > > > > > On Wed, Oct 16, 2019 at 6:27 AM Antoine Pitrou <anto...@python.org> > wrote: > > > > > >> > > >> Perhaps meson is also worth exploring? > > >> > > >> > > >> Le 15/10/2019 à 23:06, Micah Kornfield a écrit : > > >>> Hi Wes, > > >>> I agree on both accounts that it won't be a done in the short term, > and > > >> it > > >>> makes sense to tackle in incrementally. Like I said I don't have > much > > >>> bandwidth at the moment but might be able to re-arrange a few things > on > > >> my > > >>> plate. I think some people have asked on the mailing list how they > might > > >>> be able to help, this might be one area that doesn't require a lot of > > >>> in-depth knowledge of C++ at least for a proof of concept. I'll try > to > > >>> open up some JIRAs soon. > > >>> > > >>> Thanks, > > >>> Micah > > >>> > > >>> On Tue, Oct 15, 2019 at 10:33 AM Wes McKinney <wesmck...@gmail.com> > > >> wrote: > > >>> > > >>>> hi Micah, > > >>>> > > >>>> Definitely Bazel is worth exploring, but we must be realistic about > > >>>> the amount of energy (several hundred hours or more) that's been > > >>>> invested in the build system we have now. So a new build system will > > >>>> be a large endeavor, but hopefully can make things simpler. > > >>>> > > >>>> Aside from the requirements gathering process, if it is felt that > > >>>> Bazel is a possible path forward in the future, it may be good to > try > > >>>> to break up the work into more tractable pieces. For example, a > first > > >>>> step would be to set up Bazel configurations to build the project's > > >>>> thirdparty toolchain. Since we're reliant in ExternalProject in > CMake > > >>>> to do a lot of heavy lifting there for us, I imagine this (taking > care > > >>>> of what ThirdpartyToolchain.cmake does not) will take up a lot of > the > > >>>> energy > > >>>> > > >>>> - Wes > > >>>> > > >>>> On Sun, Oct 13, 2019 at 1:06 PM Micah Kornfield < > emkornfi...@gmail.com> > > >>>> wrote: > > >>>>> > > >>>>>> > > >>>>>> > > >>>>>> This might be taking the thread on more of a tangent, but maybe we > > >>>> should > > >>>>> start collecting requirements for the C++ build system in general > and > > >> see > > >>>>> if there might be better solution that can address some of these > > >>>> concerns? > > >>>>> In particular, Bazel at least on the surface seems like it might > be a > > >>>>> better fit for some of the use cases discussed here. I know this > is a > > >>>> big > > >>>>> project (and I currently don't have much bandwidth for it) but I > think > > >> if > > >>>>> CMake is lacking in these areas it might be worth at least > exploring > > >>>>> instead of going down the path of building our own meta-build > system on > > >>>> top > > >>>>> of CMake. > > >>>>> > > >>>>> Requirements that I think we are targeting: > > >>>>> 1. Be able to provide an out of box build system that requires as > > >> close > > >>>> to > > >>>>> zero dependencies beyond a standard C++ toolchain (e.g. "$BUILD > > >> minimal" > > >>>>> works on any C++ developers desktop without additional > requirements) > > >>>>> 2. The build system should limit configuration knobs in favor of > > >> implied > > >>>>> dependencies (e.g. "$BUILD python" automatically builds "compute", > > >>>>> "filesystem", "ipc") > > >>>>> 3. The build system should be configurable to use (and have the > user > > >>>>> specify) one of "System packages", "Conda packages" or source > packages > > >>>> for > > >>>>> providing dependencies (and fallback options between the three). > > >>>>> 4. The build system should be able to treat some dependencies as > > >>>> optional > > >>>>> (e.g. different compression libraries or allocators). > > >>>>> 5. Easily allow developers to limit building unnecessary code for > > >> their > > >>>>> particular task at hand. > > >>>>> 6. The build system must work across the following > > >> toolchains/platforms: > > >>>>> - Linux: g++ and clang. x86 and ARM > > >>>>> - Mac > > >>>>> - Windows (msys2 and MSVC) > > >>>>> > > >>>>> Thanks, > > >>>>> Micah > > >>>>> > > >>>>> > > >>>>> > > >>>>> On Thu, Oct 10, 2019 at 6:09 AM Antoine Pitrou <anto...@python.org > > > > >>>> wrote: > > >>>>> > > >>>>>> > > >>>>>> Yes, we could express dependencies in a Python script and have it > > >>>>>> generate a CMake module of if/else chains in cmake_modules (which > we > > >>>>>> would check in git to avoid having people depend on a Python > install, > > >>>>>> perhaps). > > >>>>>> > > >>>>>> Still, that is an additional maintenance burden. > > >>>>>> > > >>>>>> Regards > > >>>>>> > > >>>>>> Antoine. > > >>>>>> > > >>>>>> > > >>>>>> Le 10/10/2019 à 14:50, Wes McKinney a écrit : > > >>>>>>> I guess one question we should first discuss is: who is the C++ > build > > >>>>>>> system for? > > >>>>>>> > > >>>>>>> The users who are most sensitive to benchmark-driven decision > making > > >>>>>>> will generally be consuming the project through pre-built > binaries, > > >>>>>>> like our Python or R packages. If C++ developers build the > project > > >>>>>>> from source and don't do a minimal read of the documentation to > see > > >>>>>>> what a "recommended configuration" looks like, I would say that > is > > >>>>>>> more their fault than ours. In the case of the ARROW_JEMALLOC > option, > > >>>>>>> I think it's important for C++ system integrators to be aware of > the > > >>>>>>> impact of the choice of memory allocator. > > >>>>>>> > > >>>>>>> The concern I have with the current "out of the box" experience > is > > >>>>>>> that people are getting the impression that "I have to build $X, > $Y, > > >>>>>>> and $Z -- which I don't necessarily need -- to have > $CORE_FEATURE_1". > > >>>>>>> They can, of course, read the documentation and learn that those > > >>>>>>> things can be toggled off, but I think the user that reaches for > a > > >>>>>>> self-built source install is much different in general than > someone > > >>>>>>> who uses the project through the Linux binary packages, for > example. > > >>>>>>> > > >>>>>>> On the subject of managing intraproject dependencies and > > >>>>>>> relationships, I think we should develop a better way to express > > >>>>>>> relationships between components than we have now. > > >>>>>>> > > >>>>>>> As an example, building the Python library assumes that various > > >>>>>>> components are enabled > > >>>>>>> > > >>>>>>> - ARROW_COMPUTE=ON > > >>>>>>> - ARROW_FILESYSTEM=ON > > >>>>>>> - ARROW_IPC=ON > > >>>>>>> > > >>>>>>> Somewhere in the code we might have some code like > > >>>>>>> > > >>>>>>> if (ARROW_PYTHON) > > >>>>>>> set(ARROW_COMPUTE ON) > > >>>>>>> ... > > >>>>>>> endif() > > >>>>>>> > > >>>>>>> This doesn't strike me as that scalable. I would rather see a > > >>>>>>> dependency file like > > >>>>>>> > > >>>>>>> component_dependencies = { > > >>>>>>> ... > > >>>>>>> 'python': ['compute', 'filesystem', 'ipc'], > > >>>>>>> ... > > >>>>>>> } > > >>>>>>> > > >>>>>>> A helper Python script as part of the build could be used to give > > >>>>>>> CMake (because CMake is a bit poor as a programming language) the > > >>>> list > > >>>>>>> of required components based on what the user has indicated to > CMake. > > >>>>>>> > > >>>>>>> On Thu, Oct 10, 2019 at 7:36 AM Francois Saint-Jacques > > >>>>>>> <fsaintjacq...@gmail.com> wrote: > > >>>>>>>> > > >>>>>>>> There's always the route of vendoring some library and not > exposing > > >>>>>>>> external CMake options. This would achieve the goal of > > >>>>>>>> compile-out-of-the-box and enable important feature in the basic > > >>>>>>>> build. We also simplify dependencies requirements (benefits CI > or > > >>>>>>>> developer). The downside is following security patches and > grumpy > > >>>>>>>> reaction from package maintainers. I think we should explore > this > > >>>>>>>> route for dependencies that match the following criteria: > > >>>>>>>> > > >>>>>>>> - libarrow*.so don't export any of the symbols of the > dependency and > > >>>>>>>> not referenced in any public headers > > >>>>>>>> - dependency is lightweight, e.g. excludes boost, openssl, grpc, > > >>>> llvm, > > >>>>>>>> thrift, protobuf > > >>>>>>>> - dependency is not-ubiquitous on major platform and have a > stable > > >>>>>>>> API, e.g. excludes libz and openssl > > >>>>>>>> > > >>>>>>>> A small list of candidates: > > >>>>>>>> - RapidJSON (enables JSON) > > >>>>>>>> - DoubleConversion (enables CSV) > > >>>>>>>> > > >>>>>>>> There's a precedent, arrow already vendors small C++ libraries > > >>>>>>>> (datetime, utf8cpp, variant, xxhash). > > >>>>>>>> > > >>>>>>>> François > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> On Thu, Oct 10, 2019 at 6:03 AM Antoine Pitrou < > anto...@python.org> > > >>>>>> wrote: > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> Hi all, > > >>>>>>>>> > > >>>>>>>>> I'm a bit concerned that we're planning to add many additional > > >>>> build > > >>>>>>>>> options in the quest to have a core zero-dependency build in > C++. > > >>>>>>>>> See for example > https://issues.apache.org/jira/browse/ARROW-6633 > > >>>> or > > >>>>>>>>> https://issues.apache.org/jira/browse/ARROW-6612. > > >>>>>>>>> > > >>>>>>>>> The problem is that this is creating many possible > configurations > > >>>> and > > >>>>>> we > > >>>>>>>>> will only be testing a tiny subset of them. Inevitably, users > > >>>> will try > > >>>>>>>>> other option combinations and they'll fail building for some > random > > >>>>>>>>> reason. It will not be a very good user experience. > > >>>>>>>>> > > >>>>>>>>> Another related issue is user perception when doing a default > > >>>> build. > > >>>>>>>>> For example https://issues.apache.org/jira/browse/ARROW-6638 > > >>>> proposes > > >>>>>> to > > >>>>>>>>> build with jemalloc disabled by default. Inevitably, people > will > > >>>> be > > >>>>>>>>> doing benchmarks with this (publicly or not) and they'll > conclude > > >>>> Arrow > > >>>>>>>>> is not as performant as it claims to be. > > >>>>>>>>> > > >>>>>>>>> Perhaps we should look for another approach instead? > > >>>>>>>>> > > >>>>>>>>> For example we could have a single ARROW_BARE_CORE (whatever > the > > >>>> name) > > >>>>>>>>> option that when enabled (not by default) builds the tiniest > > >>>> minimal > > >>>>>>>>> subset of Arrow. It's more inflexible, but at least it's > something > > >>>>>> that > > >>>>>>>>> we can reasonably test. > > >>>>>>>>> > > >>>>>>>>> Regards > > >>>>>>>>> > > >>>>>>>>> Antoine. > > >>>>>> > > >>>> > > >>> > > >> > > >