Yes, we could express dependencies in a Python script and have it generate a CMake module of if/else chains in cmake_modules (which we would check in git to avoid having people depend on a Python install, perhaps).
Still, that is an additional maintenance burden. Regards Antoine. Le 10/10/2019 à 14:50, Wes McKinney a écrit : > I guess one question we should first discuss is: who is the C++ build > system for? > > The users who are most sensitive to benchmark-driven decision making > will generally be consuming the project through pre-built binaries, > like our Python or R packages. If C++ developers build the project > from source and don't do a minimal read of the documentation to see > what a "recommended configuration" looks like, I would say that is > more their fault than ours. In the case of the ARROW_JEMALLOC option, > I think it's important for C++ system integrators to be aware of the > impact of the choice of memory allocator. > > The concern I have with the current "out of the box" experience is > that people are getting the impression that "I have to build $X, $Y, > and $Z -- which I don't necessarily need -- to have $CORE_FEATURE_1". > They can, of course, read the documentation and learn that those > things can be toggled off, but I think the user that reaches for a > self-built source install is much different in general than someone > who uses the project through the Linux binary packages, for example. > > On the subject of managing intraproject dependencies and > relationships, I think we should develop a better way to express > relationships between components than we have now. > > As an example, building the Python library assumes that various > components are enabled > > - ARROW_COMPUTE=ON > - ARROW_FILESYSTEM=ON > - ARROW_IPC=ON > > Somewhere in the code we might have some code like > > if (ARROW_PYTHON) > set(ARROW_COMPUTE ON) > ... > endif() > > This doesn't strike me as that scalable. I would rather see a > dependency file like > > component_dependencies = { > ... > 'python': ['compute', 'filesystem', 'ipc'], > ... > } > > A helper Python script as part of the build could be used to give > CMake (because CMake is a bit poor as a programming language) the list > of required components based on what the user has indicated to CMake. > > On Thu, Oct 10, 2019 at 7:36 AM Francois Saint-Jacques > <fsaintjacq...@gmail.com> wrote: >> >> There's always the route of vendoring some library and not exposing >> external CMake options. This would achieve the goal of >> compile-out-of-the-box and enable important feature in the basic >> build. We also simplify dependencies requirements (benefits CI or >> developer). The downside is following security patches and grumpy >> reaction from package maintainers. I think we should explore this >> route for dependencies that match the following criteria: >> >> - libarrow*.so don't export any of the symbols of the dependency and >> not referenced in any public headers >> - dependency is lightweight, e.g. excludes boost, openssl, grpc, llvm, >> thrift, protobuf >> - dependency is not-ubiquitous on major platform and have a stable >> API, e.g. excludes libz and openssl >> >> A small list of candidates: >> - RapidJSON (enables JSON) >> - DoubleConversion (enables CSV) >> >> There's a precedent, arrow already vendors small C++ libraries >> (datetime, utf8cpp, variant, xxhash). >> >> François >> >> >> On Thu, Oct 10, 2019 at 6:03 AM Antoine Pitrou <anto...@python.org> wrote: >>> >>> >>> Hi all, >>> >>> I'm a bit concerned that we're planning to add many additional build >>> options in the quest to have a core zero-dependency build in C++. >>> See for example https://issues.apache.org/jira/browse/ARROW-6633 or >>> https://issues.apache.org/jira/browse/ARROW-6612. >>> >>> The problem is that this is creating many possible configurations and we >>> will only be testing a tiny subset of them. Inevitably, users will try >>> other option combinations and they'll fail building for some random >>> reason. It will not be a very good user experience. >>> >>> Another related issue is user perception when doing a default build. >>> For example https://issues.apache.org/jira/browse/ARROW-6638 proposes to >>> build with jemalloc disabled by default. Inevitably, people will be >>> doing benchmarks with this (publicly or not) and they'll conclude Arrow >>> is not as performant as it claims to be. >>> >>> Perhaps we should look for another approach instead? >>> >>> For example we could have a single ARROW_BARE_CORE (whatever the name) >>> option that when enabled (not by default) builds the tiniest minimal >>> subset of Arrow. It's more inflexible, but at least it's something that >>> we can reasonably test. >>> >>> Regards >>> >>> Antoine.