For the record, the concrete issue which sparked this discussion received an elegant fix from Benjamin: https://github.com/apache/arrow/pull/5391
Regards Antoine. Le 17/09/2019 à 04:34, Sutou Kouhei a écrit : > Hi, > > If this is circular, it's a problem. But this isn't circular > for now. > > I think that we can use libarrow as the fundamental shared > library to provide common implementation like [1] if we need > to provide common implementation for template. (I think that > we don't provide common implementation for template.) > > [1] > https://github.com/apache/arrow/pull/5221/commits/e88b2579f04451d741eeddcb6697914bcc1019a6 > > Anyway, I'm not strongly oppose to this idea. If we choose > one shared library approach, Linux packages, GLib bindings > and Ruby bindings can follow the change. > > > Thanks, > -- > kou > > In <cajpuwmdwencjpbw+hrswaojfez7e_yci-fg2d3lwgvncf45...@mail.gmail.com> > "Re: [DISCUSS][C++] Rethinking our current C++ shared library (.so / .dll) > approach" on Thu, 12 Sep 2019 13:23:01 -0500, > Wes McKinney <wesmck...@gmail.com> wrote: > >> One thing I forgot to mention: >> >> One of the things driving the creation of new shared libraries is >> interdependencies. For example: >> >> libarrow -> libparquet >> libarrow -> libarrow_dataset >> libparquet -> libarrow_dataset >> >> With the modular LLVM-like approach this issue goes away. >> >> On Thu, Sep 12, 2019 at 1:16 PM Wes McKinney <wesmck...@gmail.com> wrote: >>> >>> I forgot to add the link to the LLVM library listing >>> >>> https://gist.github.com/wesm/d13c2844db0c19477e8ee5c95e36a0dc >>> >>> On Thu, Sep 12, 2019 at 1:14 PM Wes McKinney <wesmck...@gmail.com> wrote: >>>> >>>> hi folks, >>>> >>>> I wanted to share some concerns that I have about our current >>>> trajectory with regards to producing shared libraries from the Arrow >>>> build system. >>>> >>>> Currently, a comprehensive build produces many shared libraries: >>>> >>>> * libarrow >>>> * libarrow_dataset >>>> * libarrow_flight >>>> * libarrow_python >>>> * libgandiva >>>> * libparquet >>>> * libplasma >>>> >>>> There are some others. There are a number of problems with the current >>>> approach: >>>> >>>> * Each DLL needs its own set of "visibility" macros to control the use >>>> of __declspec(dllimport/dllexport) on Windows, which is necessary to >>>> instruct the import or export of symbols between DLLs on Windows. See >>>> e.g. >>>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/visibility.h >>>> >>>> * Templates instantiated in one DLL may cause a violation of the One >>>> Definition Rule during linking (we lost at least a day of work time >>>> collectively to issues around this in ARROW-6244). It is good to be >>>> able to share common template interfaces in general >>>> >>>> * Statically-linked dependencies in one shared lib may need to be >>>> statically linked into another library. For example, libgandiva >>>> statically links parts of LLVM, but we will likely have some other >>>> code that makes use of LLVM for other purposes (it has been discussed >>>> in the context of Avro parsing) >>>> >>>> Overall, my preferred solution to these issues is to move to a similar >>>> approach to what the LLVM project does. To help understand, let me >>>> have you first look at the libraries that come from the llvm-7-dev >>>> package on Ubuntu >>>> >>>> Here we have a collection of static "module" libraries that implement >>>> different parts of the LLVM platform. Finally, a _single_ shared >>>> library libLLVM-7.so is produced. >>>> >>>> I think we should do the same thing in Apache Arrow. So we only ever >>>> will produce a single shared library from the build. We can >>>> additionally make the "name" of this shared library configurable to >>>> suit different needs. For example, the default name could be simply >>>> "libarrow.so" or something. But if someone wants to produce a >>>> barebones Parquet shared library they can override the name to create >>>> a "libparquet.so" that contains only the "libarrow_core.a" and >>>> "libarrow_io.a" symbols needed for reading Parquet files. >>>> >>>> This would have additional benefits: >>>> >>>> * Use the same visibility macros for all exported C++ symbols, rather >>>> than having to define DLL-specific visibility >>>> >>>> * Improved modularization of builds and linking for third party users, >>>> similar to the way that LLVM's modular linking works, see the way that >>>> Gandiva requests specific components from LLVM to use for static >>>> linking >>>> https://github.com/apache/arrow/blob/master/cpp/cmake_modules/FindLLVM.cmake#L53 >>>> >>>> * Net simpler linking and deployment. Only one shared library to deal with >>>> >>>> There are some drawbacks, however: >>>> >>>> * Our C++ Linux packaging approach would need to be changed to be more >>>> LLVM-like (a single .deb/.yum package containing the C++ platform >>>> rather than many packages as now) >>>> >>>> Interested to hear from other C++ developers. >>>> >>>> Thanks >>>> Wes