Is it correct to say the desired behavior in the long term is for such an error to occur?
If so, could we make it so the --enable-strict-barrier (or other option) behavior educes those errors for developers to opt in to finding them sooner than later? On Wed, Apr 23, 2025 at 3:43 PM Tomas Kalibera <tomas.kalib...@gmail.com> wrote: > > > On 4/24/25 00:18, Michael Chirico wrote: > > In that case it seems like just erroring instead of returning invalid > > pointers is a much friendlier option. Why give developers an unpinned > > grenade to carry around? > > That would be too strict at this point. There is too much code around > depending on that holding on to an invalid pointer (but not > dereferencing it) is ok, and it is currently still working on the > current platforms where R is used. > > The mechanism of returning an invalid pointer from functions like > INTEGER() called on an empty vector (sometimes this is called > "poisoning") is much more precise. When this causes a crash, it is > because the program actually used the pointer, which is a clear error > (and it is a clear error regardless of the platform). There are no > false alarms: when package authors have to debug a package because of > this, there is really an error in the code to be fixed. And, for a large > number of CRAN packages where a recent improvement in the poisoning > showed up more cases, the debugging was done by the CRAN team for the > package authors, in some cases even providing patches. > > Best > Tomas > > > > > On Wed, Apr 23, 2025 at 1:38 PM Tomas Kalibera <tomas.kalib...@gmail.com> > > wrote: > >> On 4/23/25 19:03, Michael Chirico wrote: > >>> h/t Tim Taylor for pointing out my blindspot :) > >>> > >>> We have Memcpy() in API already [1], which wraps a 0-aware R_chk_memcpy() > >>> [2]. > >>> > >>> We don't quite have Memset() in API, though; instead we have Memzero() > >>> [3] for R_chk_memset(s, 0, n) which is 0-aware memset() [4]. > >> I don't think that using wrappers for this is the right approach. Even > >> loading an invalid pointer is undefined behavior. While it wouldn't > >> matter on typical hardware where R is used today, it could cause a crash > >> on some hardware, and it may well be that various checkers will start > >> warning about such things. Also, holding (intentionally) invalid > >> pointers makes debugging harder. > >> > >> At least in new code, I would avoid holding invalid pointers. Certainly > >> I don't think we should be adding wrappers to R for C functions to make > >> them work with invalid pointers. > >> > >> Best > >> Tomas > >> > >>> [1] > >>> https://github.com/r-devel/r-svn/blob/9e597ca8132e8b6298b0bedb39fa8deba5c819df/src/include/R_ext/RS.h#L59-L60 > >>> [2] > >>> https://github.com/r-devel/r-svn/blob/9e597ca8132e8b6298b0bedb39fa8deba5c819df/src/main/memory.c#L3575-L3580 > >>> [3] > >>> https://github.com/r-devel/r-svn/blob/9e597ca8132e8b6298b0bedb39fa8deba5c819df/src/include/R_ext/RS.h#L62-L63 > >>> [4] > >>> https://github.com/r-devel/r-svn/blob/9e597ca8132e8b6298b0bedb39fa8deba5c819df/src/main/memory.c#L3582-L3587 > >>> > >>> Mike C > >>> > >>> On Wed, Apr 23, 2025 at 8:57 AM Michael Chirico > >>> <michaelchiri...@gmail.com> wrote: > >>>> From R 4.5.0 [1], all builds of R discourage use of INTEGER() [and > >>>> friends REAL(), ... and *_RO() equivalents] on length-0 SEXP [2]. > >>>> Before R 4.5.0, this was the behavior under --enable-strict-barrier. > >>>> > >>>> That means the following can segfault under strict builds (e.g. > >>>> -fsanitize=alignment and -O0): > >>>> > >>>> SEXP x = PROTECT(Rf_allocVector(INTSXP, 0)); > >>>> SEXP y = PROTECT(Rf_allocVector(INTSXP, 0)); > >>>> const int *x = INTEGER_RO(x); // invalid! > >>>> int *y = INTEGER(y); > >>>> memcpy(y, x, 0); // alluring, but undefined behavior! > >>>> > >>>> There are a number of CRAN packages that fall victim to this, see e.g. > >>>> this PR and others linked to it [3]. I'm sure there are dozens if not > >>>> hundreds of other equivalent bugs waiting to be discovered that just > >>>> aren't covered by existing tests. > >>>> > >>>> {rlang} took the approach to define r_memcpy() and r_memset() which > >>>> wrap memcpy() and memset(), resp., with an added length-0 check [4]; I > >>>> think R itself should offer these (probably more consistently styled > >>>> as R_Memcpy() and R_Memset()). > >>>> > >>>> (NB there's a possibility I'm still not fully grasping what's going on > >>>> here :) ) > >>>> > >>>> Mike C > >>>> > >>>> [1] related: https://stat.ethz.ch/pipermail/r-devel/2024-June/083456.html > >>>> [2] > >>>> https://github.com/r-devel/r-svn/blob/2b29e52e1c4e3d26b649cb7ac320b8a3dd13de30/src/main/memory.c#L4146 > >>>> [3] https://github.com/r-lib/vctrs/pull/1968 > >>>> [4] https://github.com/r-lib/rlang/pull/1797 > >>> ______________________________________________ > >>> R-devel@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel