Responding to Antoine's specific questions:

* 429 R packages on CRAN list C++11 as a SystemRequirement. These numbers
may be a slight undercount because the SystemRequirements field is not
machine-read. Some packages (e.g.
https://github.com/eddelbuettel/rcppsimdjson/) appear to actually require
C++17 but don't declare it in SystemRequirements. That said, while there
are a number of widely used and depended-on packages that require C++11,
none of the ones that require C++14 and higher have broad adoption.
* According to the official guide [1], C++14 support is partial on the GCC
4.9 that RTools 35 uses. So it would depend on what features we were using
as to whether it was an issue or not.
* Binary packages in R: it essentially comes down to what CRAN builds and
hosts. We provide a source package to CRAN, and they build binaries for
macOS and Windows for the current R release and the previous release (minor
releases, done annually). Windows users don't typically install from
source, so that's not the issue--but we don't get to decide the toolchain
used to compile the binary because we don't own that.

Some other points on the R ecosystem. There are several unrelated concerns
here that we should keep distinct in our minds:

1. What R supports. Per [1], R 3.4 and above have some support for C++14
and 17, and C++14 is even the default C++ standard for the current R
release (4.1). We're all good here.
2. What CRAN requires. Packages must build on macOS, Windows, and Linux and
are checked on the previous release, current release, and development
branch of R. Linux machines use a variety of compilers and toolchains.
Windows, as we've said, always uses RTools, and as of last month, only
RTools 40 (gcc 8.3). As noted on the PR, CRAN uses an old macOS (10.13) to
build mac binary packages, and this has partial C++17 support. Unlike the
RTools upgrade associated with R 4.0, this is not tied to the R version. So
we would need to make sure we compile on the same xcode version they use
(or wait for them to eventually upgrade their machines).
3. What users can install on their systems. In the enterprise context,
users don't always get to upgrade R freely, nor can they always install
newer compilers. I acknowledge that raising this is FUD, but we just don't
know how significant this is.
4. What other R packages require. Because of #3, maintainers of major R
packages in the ecosystem generally try to support the last 4-5 releases so
that users who are stuck unable to upgrade R are not left behind. This
means 3 versions of R (and, given yearly releases, a 3 year lag) beyond
what CRAN requires. This is not to say that we have to do the same, just
that if we don't, then that limits the chances that one of those
maintainers would view arrow as something they can depend on. (That said, I
don't think there's high likelihood that these packages would take a hard
dependency on arrow; optional dependency ("Suggests", in R-speak) is more
likely, regardless of C++ standard, due to other reasons (size, FUD, etc.).)

Neal

[1]:
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Using-C_002b_002b-code

On Wed, Jun 9, 2021 at 10:26 AM Eduardo Ponce <edponc...@gmail.com> wrote:

> After the discussion in today's Arrow sync call, I do think it would be
> beneficial to come up with a formal process for deciding when is a "right
> time" for upgrading Arrow to a newer C++ standard. I suggest we could
> consider a set of general metrics/criteria that try to summarize the
> benefits and drawbacks of such change. Some metrics will be measurable but
> others will be qualitative. For the latter, we can use a consensus-based
> scale rating (1-5 with a meaning attached to each value). I am curious what
> approach other major C++ projects have used to resolve decisions on
> selecting a C++ standard (aside from crI foreseeitically required
> features)?
>
> The criteria used to evaluate newer C++ standards need to fairly consider
> people with different roles with regards to the Arrow project, such as
> developers, contributors, C++ users, other language users (R, Python), and
> maintainers.
> Here is a possible (and likely incomplete) set of metrics:
>
> Measurable metrics:
> * code size (source and binary) - measured in bytes
> * compilation time (consider each major Arrow component)
> * runtime - what are the performance changes? (consider each major Arrow
> component)
> * systems/OS/tools supported and deprecated
> * ...
>
> Qualitative metrics:
> * code structure/maintainability - how would it improve development?
> * code readability - ease of understanding details for new/current
> contributors?
> * ...
>
> I do think this approach will give us a better standpoint for deciding on
> when to upgrade to a newer C++ standard.
> Nevertheless, there are complexities for implementing such an approach:
> * selecting the "correct" metrics
> * designing the scale rating
> * How do we get the community to provide their opinion for the qualitative
> metrics? What is a "good enough" coverage?
> * How do we summarize the results into a binary decision: upgrade vs not
> upgrade?
> * ...
>
> In the end, it might not be worthwhile to go through all this work, I am
> simply expressing an idea.
>
> ~Eduardo
>
>
> On Wed, Jun 9, 2021 at 9:40 AM Antoine Pitrou <anto...@python.org> wrote:
>
> > On Tue, 8 Jun 2021 17:37:30 -0500
> > Jonathan Keane <jke...@gmail.com> wrote:
> > > I've been digging a bit to try and put numbers on those users the Neal
> > > mentions. Specifically, we know that requiring C++17 will mean that R
> > > users on windows using versions of R before 4.0.0 will not be able to
> > > compile/install arrow. Although R version 3.6 is no longer supported
> > > by CRAN [1], many people hang on to older versions for an extended
> > > period of time.
> > >
> > > We are still working on getting more solid numbers about how many
> > > people might still be on these old versions, but here is what I have
> > > so far:
> > >
> > > Using Rstudio's cran mirror logs of package installations [2] (and
> > > with the help of Arrow datasets to process/filter these files 🎉) for
> > > the period from 2020-05-18 [3] to today, for the installations that
> > > have an r version reported approximately 27% of the windows package
> > > installs are on versions before 4.0.0 (and therefore would be unable
> > > to install arrow if we require C++17 right now).
> >
> > Is this because binary packages are forbidden in R-land?  Do Windows
> > users of R really install Arrow from source?  Or is it really
> > impossible to use a modern compiler when building R packages for R
> > versions older than 4.0 ?
> >
> > Note the requirement we're proposing to bump is for *building* Arrow.
> > Using binaries should not be affected, especially on Windows (on Linux,
> > you must be a bit more careful, but normally the CentOS devtoolset
> > should take care of that).
> >
> > Regards
> >
> > Antoine.
> >
> >
> >
>

Reply via email to