Hi Wes, sorry for the delay I haven't been monitoring this DL proactively.

Please notice that I'm not the expert in this topic, so I'll share as much
information
as I can but others with more expertise should feel free comment as well.
Please
also note that some of the restrictions we have are common practices in
R packages that are out of our control, at least without significant
investment.

I'll document what I know in this email, but please let me know if there is
a wiki
or a better place to move this documentation into.

## Background

CRAN, The Comprehensive R Archive Network, is the most popular (primary)
package repo for the R community. You can think of CRAN as Homebrew or
pip.org. CRAN encourages cross-platform packages to be submitted and to
ease compilation and testing, provide support to precompile binaries for OS
X
and Windows. We will focus now on Windows specifics from now on.

CRAN and R rely on a set of tools based on Mingw to easily compile packages
in Windows, this tools set is known as RTools. Originally, Prof. Brian
Ripley and
Duncan Murdoch put this toolset together; however, Jeroen Ooms is it current
maintainer. RTools is based on Mingw but from past experience, not
completely
interchangeable with the standard Mingw distribution. I'm afraid I don't
have the
details but this is mostly related to specific packages, versions and
compilers
included in Rtools. It's possible to match a Mingw environment with RTools
but
this is, in general, not a straightforward task.

A few months ago, I naively tried to accomplish this work myself. As in, get
RTools to compile Apache Arrow, how hard can it be? It's hard to explain
all the caveats in a single mail, but if you are interested, you can read
my own exploration of possible solutions to this problem in this gist
writeup [1].

The outcome of this investigation, at least for me and my limited knowledge
was
to not try to do this on my own by reinventing the wheel; otherwise, this
would
have taken months of my own time. The solution was then to find out how
other
R packages have solve this problem in the past.

Given the specifics of the RTools toolchain, for complex projects with
significant
number of components and dependencies, the best (and maybe only!) way
to get R packages into CRAN in Windows is to precompile the binaries outside
of the CRAN build process. The repo of precompiled packages is called
rwinlibs [2] and has 75 packages and growing. When compiling in CRAN, rather
than building the library, it simply gets downloaded from the rwinlibs repo.

How then are the rwinlibs libraries build then? All the packages are built
through
an automated build system available under theb rtools-packages [3] repo
where
an appveyor script detects changes and builds the appropriate libraries.
This repo
runs with the latest RTools toolchain. To support previous versions of
R/RTools a
the rtools-backports [4] repo provides backward compatibility in an
automated way.

So now we can get back at discussing how we want to make this work in the
arrow project. One way, which this PR encourages is to say "Lets not worry
about
what the R/CRAN publishing process is, they have their own processes and
tools
to build binaries for Windows. This is similar to brew formulae, the
formula that
builds arrow for OS X using homebrew is in a different repo [5]".

While splitting the release processes into multiple repos has some
advantages,
it certainly has some caveats. For instance, when publishing a new release
of
arrow in Homebrew, one needs to manually go an update the Hombrew formulae.

That said, I would hope that the Homebrew release process is documented in
the
Arrow project in the same way that we should document the R release process
in
the Arrow project. Hopefully this mail helps build a first iteration on
this.

## Releasing

These instructions are a bit more pragmatic as to what needs to be done to
release
the R package in CRAN:

(1) Send PR to the rtools-packages [3], increment the version, the repo
already
     downloads the binaries from the Arrow GitHub project. Ensure that the
appveyor
     build succeeds. If the build or tests fails, send the appropriate PR
to the official
     Arrow repo.
(2) Send PR to the rtools-backports [4], similar to (1) but different repo.
(3) Copy the output produced by (1) and (2) as a PR to the rwinlib/arrow
[6] repo.
(4) Before merging (3) validate that CRAN can build and test using the new
library
     using the winbuilder service [7]. This service is maintained to CRAN
and allows
     you to pre-check a package builds properly under a CRAN-like build
machine
     for Windows.
(5) Submit package to CRAN, make sure their practices and processes are
     followed [8].

While I did my best to document the steps, there is certainly more details
that can be
added over time. Regardless, feel free to reach out to me with questions,
support
requests and why not and I'll try my best to address them.

Best, Javier

[1]: https://gist.github.com/javierluraschi/2ade2204364a7c20e9c3d95504d12ce5
[2]: https://github.com/rwinlib/
[3]: https://github.com/r-windows/rtools-packages
[4]: https://github.com/r-windows/rtools-backports
[5]:
https://github.com/Homebrew/homebrew-core/blob/master/Formula/apache-arrow.rb
[6]: https://github.com/rwinlib/arrow
[7]: https://win-builder.r-project.org/
[8]: https://cran.r-project.org/submit.html




On Sat, Mar 16, 2019 at 1:10 PM Wes McKinney <wesmck...@gmail.com> wrote:

> hi folks,
>
> I have noticed there is work under way to prepare Apache Arrow for
> submission to the CRAN package manager for R users. I'm slightly
> concerned about the lack of information and documentation in the
> project regarding what is involved with this effort. This patch in
> particular raised some eyebrows
>
> https://github.com/apache/arrow/pull/3932
>
> This introduces a dependency into the project on pre-built static
> libraries based on processes that aren't documented in the project. I
> see this repository containing these static libraries for the R
> Windows toolchain, but if I needed to produce them myself I would not
> know what to do
>
> https://github.com/rwinlib/arrow
>
> Additionally, in general, if I wanted to build and test Arrow and R
> from source on Windows, I also would not know what to do.
>
> In the Python world, this would be akin to depending on e.g.
> conda-forge packages for Windows development, but not having any
> information in the repository about to build Arrow C++ and Python from
> source on Windows.
>
> So I would like to see some transparency / documentation around the
> scripts and processes involved with this so that we don't end up with
> a "bus factor" problem where Arrow PMC members are unable to undertake
> basic maintenance and release management activities. Currently the
> work that is going on seems opaque to me and as such feels contrary to
> the Apache Way.
>
> I understand that there is some urgency to make the Arrow libraries
> available to R users, but I want to make sure we are working in a
> sustainable manner to grow a community of developers who are able to
> do work on each part of the project.
>
> Thanks,
> Wes
>

Reply via email to