Re: [R-pkg-devel] Replacement for SETLENGTH

Tomas Kalibera Wed, 15 Jan 2025 00:58:40 -0800


On 1/15/25 05:26, Merlise Clyde, Ph.D. wrote:

I am trying to determine the best way to eliminate the use of SETLENGTH to 
truncate over allocated vectors in my package BAS to eliminate the NOTES about 
non-API calls in anticipation of R 4.5.0.

 From WRE:  "At times it can be useful to allocate a larger initial result vector and 
resize it to a shorter length if that is sufficient. The functions Rf_lengthgets and 
Rf_xlengthgets accomplish this; they are analogous to using length(x) <- n in R. 
Typically these functions return a freshly allocated object, but in some cases they may 
re-use the supplied object."

it looks like using

     x = Rf_lengthgets(x, newsize);
     SET_VECTOR_ELT(result, 0, x);

before returning works to resize without a performance hit that incurs with a copy. (will this always re-use the supplied object if newsize < old size?)


There is no mention in section 5.9.2 about the need for re-protection of the 
object,  but it seems to be mentioned in some packages as well as a really old 
thread about SET_LENGTH that looks like a  non-API MACRO to lengthgets,

indeed if I call gc() and then rerun my test I have had some non-reproducible 
aborts in R Studio on my M3 Mac (caught once in R -d lldb)

The important part for protection is that Rf_lengthgets _may_ return afreshly allocated object. This means that the object needs protectionfrom garbage collection, implicit or explicit - and that is covered insection "Handling the effects of garbage collection". There are manyfunctions in the R API that return freshly allocated objects, so don'texpect that documentation of every such function would give advice onhow to protect, that is covered in that special section.

So, you are right, some protection is needed _if_ the return value ofRf_lengthgets may be exposed to gc().


Do I need to do something more like

PROTECT_INDEX ipx0;.
PROTECT_WITH_INDEX(x0 = allocVector(REALSXP, old_size), &ipx0);

PROTECT_INDEX ipx1;.
PROTECT_WITH_INDEX(x1 = allocVector(REALSXP, old_size), &ipx1);

# fill in values in x0 and  x1up to new_size (random) < old_size
...
REPROTECT(x0 = Rf_lengthgets(x0, new_size), ipx0);
REPROTECT(x1 = Rf_lengthgets(x1, new_size), ipx1);

SET_VECTOR_ELT(result, 0, x0);
SET_VECTOR_ELT(result, 1, x1);
...
UNPROTECT(2);   # or is this 4?

You have protected two objects here, one was in x0 and one in x1(REPROTECT doesn't change the depth of the protection stack). Somepeople put that into a comment:


UNPROTECT(2); /* x1, x0 */

The code above is ok. In some cases, you can shuffle it around a bit orrely on implicit protection if you want to reduce the need for explicitprotection. But perfomance-wise it doesn't matter given code that isallocating, etc, that takes much more time - it is more about readability.


For instance,

result = PROTECT(allocVector(...))
x0 = allocVector()
SET_VECTOR_ELT(result, 0, x0);
// now x0 is implicitly protected via result
...

x0 = Rf_lengthgets(..)
SET_VECTOR_ELT(result, 0, x0);

/// now the new value of x0 is implicitly protected via result (the oldvalue may not be)


UNPROTECT(1)  // result
return result

return(result);


There is also a mention in WRE of R_PreserveObject and R_ReleaseObject -

looking for advice if this is needed, or which approach is better/more stable 
to replace SETLENGTH?   (I have many many instances that need to be updated, so 
trying to get some clarity here before updating and running code through 
valgrind or other sanitizers to catch any memory issues before submitting an 
update to CRAN.

PreserveObject/ReleaseObject is good e.g. for global structures,probably not in this case. The difficulty there is making sureReleaseObject() does execute in case of error, a non-local return. Onthe other hand, protection via PROTECT/UNPROTECT is automatically robustto non-local returns (automatic unprotection).

There is nothing specific about Rf_lengthgets wrt to protection here -the same rules apply to any other R API function that returns an SEXP.

For finding protection bugs in code, one can use an R build with barrierchecking enabled and gctorture or rchk tool. Some bugs may lead tocrashes or incorrect outputs even in normaln builds. Some bugs may befound by UBSAN. But none of this is a verification tool, one can onlyfind some bugs in some cases, correctness remains the responsibility ofthe programmer.


Best
Tomas


best,
Merlise







______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] Replacement for SETLENGTH

Reply via email to