On Sun, Jan 6, 2019 at 5:39 PM Wes McKinney <wesmck...@gmail.com> wrote: > > hi Jeroen, > > On Sun, Jan 6, 2019 at 10:28 AM Jeroen Ooms <jeroeno...@gmail.com> wrote: > > > > On 2019/01/02 17:08:58, Wes McKinney <w...@gmail.com> wrote: > > > hi folks,> > > > > > > With 0.12 around the corner and significant progress on the R bindings> > > > project (sufficient for Spark integration [1]), I am wondering how> > > > everyday R users are going to be able to install the software> > > > respectively on Linux, macOS, and Windows. Thoughts about the strategy> > > > for this?> > > > > The R packaging is a bit different than python. For Windows and macOS, > > we can statically link external libs into the R package, to ship a > > standalone binary R package without any runtime dependencies. On > > Linux, R requires the system package manager (apt/yum) to provide > > external libs. The R package manager doesn't work well with libs from > > Conda. > > How do R libraries handle (or not handle) symbol conflicts if > everything is statically linked?
Not sure what you mean. R packages on Mac/Win statically their system dependencies; there should be no interference with other packages. In the case of arrow, we build the R package using libarrow.a (which already contains the required boost libs), and then the resulting R binary package consists a single dll/dylib containing both the R bindings + libarrow, without any external runtime dll dependencies. > There might be some collaboration opportunity with Kouhei or others > who have been working on msys2 packaging, which AFAIK is going to be > nearly the same toolchain Yes I based the build on Kouhei's build script (see the first line of the PKGBUILD file in the rwinlib repo), however I disabled some extra features which complicate the process, so that it looks more like the homebrew configuration. > Keep in mind that the #1 use case for the Python package right now is > to read and write Parquet files, which requires compression libraries > and Thrift. In the short term, I would expect the same to be true of > the R package, so failing to package Parquet will mean to cripple the > package. Which compression libraries exactly do we need to build with parquet support? Can we build arrow using vendored thrift, or do we need to build thrift separately? If this is important, we should send a PR to homebrew to enable this feature in their builds. I am not familiar with arrow yet, how do I test if parquet works using the R package? > How would you propose to make this happen on a practical timeline (3 > months or less)? This requirement (getting packages into an official > Linux distro) is significantly more onerous than any of the other > platforms we are packaging for. You need to find a Debian maintainer that is willing to upload the package. I don't know the details of the process either. I think the .deb has to pass lintian and they require some degree of API stability. If you plan to make backward incompatible api changes in arrow 0.13, then publishing to Debian may be premature.