I just opened https://issues.apache.org/jira/browse/ARROW-7089 about increasing transparency around what options are causing thirdparty dependencies to be required
On Thu, Nov 7, 2019 at 10:05 AM Wes McKinney <wesmck...@gmail.com> wrote: > > hi Richard, > > On Thu, Nov 7, 2019 at 9:59 AM Richard Bachmann > <richard.bachm...@cern.ch> wrote: > > > > Hello, > > I'm contacting you on behalf of the LCG Releases team at CERN. We > > provide a common software stack for LHCb, ATLAS and others to be used at > > CERN and the worldwide computing grid. > > > > Right now we're looking into optimizing the way we're building Apache > > Arrow (C++ & Python) and its dependencies. Ideally we'd like to build > > Arrow using only the minimum of necessary dependencies to run it, and to > > use packages already installed in the stack to fulfill these > > dependencies. The former would be nice to keep the stack clean, the > > latter would help us avoid duplication and failing builds due to mirrors > > going offline. > > > > Our builds currently run with the ARROW_DEPENDENCY_SOURCE=AUTO > > <https://github.com/apache/arrow/blob/master/docs/source/developers/cpp.rst> > > setting, which results in duplicate and non-essential packages being > > downloaded by Arrow, as well as dependency on external mirrors. Setting > > it to SYSTEM allows us to avoid the downloads, but then the build > > process fails due to missing unused dependencies. > > I'm surprised to hear this based on what I know about the build system > and from extensive local development. > > Can you show the exact CMake invocation you are using and indicate > which unused dependencies are being downloaded? > > In this Docker minimal build (unless something has been recently > broken) that the project can be built with only a small number of > third party dependencies: > > https://github.com/apache/arrow/tree/master/cpp/examples/minimal_build > > Note that we support a fully "offline" build to allow thirdparty > dependencies to be built in an air-gapped environment > > https://github.com/apache/arrow/blob/master/docs/source/developers/cpp.rst#offline-builds > > > Do you know if there is a recommended way to achieve this? The problem > > seems to stem from the fact that all listed dependencies are downloaded, > > whether they are needed or not. We have considered patching out the > > non-essential dependencies ('double-conversion', 'GTEST', etc.) from the > > dependency list, as well as formally adding the unneeded dependencies to > > the stack in order to run with the SYSTEM setting. However, if there is > > a proper way to do it we would of course prefer to follow that course of > > action. > > We'll be able to know more based on how you're calling CMake and with > what options, but the build system should not be downloading any > dependencies that are not needed. > > > > > Any help would be very appreciated. > > Kind regards: > > > > - Richard Bachmann > >