On 1/15/25 05:26, Merlise Clyde, Ph.D. wrote:
I am trying to determine the best way to eliminate the use of SETLENGTH to 
truncate over allocated vectors in my package BAS to eliminate the NOTES about 
non-API calls in anticipation of R 4.5.0.

 From WRE:  "At times it can be useful to allocate a larger initial result vector and 
resize it to a shorter length if that is sufficient. The functions Rf_lengthgets and 
Rf_xlengthgets accomplish this; they are analogous to using length(x) <- n in R. 
Typically these functions return a freshly allocated object, but in some cases they may 
re-use the supplied object."

it looks like using

     x = Rf_lengthgets(x, newsize);
     SET_VECTOR_ELT(result, 0, x);
before returning works to resize without a performance hit that incurs with a copy. (will this always re-use the supplied object if newsize < old size?)

There is no mention in section 5.9.2 about the need for re-protection of the 
object,  but it seems to be mentioned in some packages as well as a really old 
thread about SET_LENGTH that looks like a  non-API MACRO to lengthgets,

indeed if I call gc() and then rerun my test I have had some non-reproducible 
aborts in R Studio on my M3 Mac (caught once in R -d lldb)

The important part for protection is that Rf_lengthgets _may_ return a freshly allocated object. This means that the object needs protection from garbage collection, implicit or explicit - and that is covered in section "Handling the effects of garbage collection".  There are  many functions in the  R API that return freshly allocated objects, so don't expect that documentation of every such function would give advice on how to protect, that is covered in that special section.

So, you are right, some protection is needed _if_ the return value of Rf_lengthgets may be exposed to gc().


Do I need to do something more like

PROTECT_INDEX ipx0;.
PROTECT_WITH_INDEX(x0 = allocVector(REALSXP, old_size), &ipx0);

PROTECT_INDEX ipx1;.
PROTECT_WITH_INDEX(x1 = allocVector(REALSXP, old_size), &ipx1);

# fill in values in x0 and  x1up to new_size (random) < old_size
...
REPROTECT(x0 = Rf_lengthgets(x0, new_size), ipx0);
REPROTECT(x1 = Rf_lengthgets(x1, new_size), ipx1);

SET_VECTOR_ELT(result, 0, x0);
SET_VECTOR_ELT(result, 1, x1);
...
UNPROTECT(2);   # or is this 4?

You have protected two objects here, one was in x0 and one in x1 (REPROTECT doesn't change the depth of the protection stack). Some people put that into a comment:

UNPROTECT(2); /* x1, x0 */

The code above is ok. In some cases, you can shuffle it around a bit or rely on implicit protection if you want to reduce the need for explicit protection. But perfomance-wise it doesn't matter given code that is allocating, etc, that takes much more time - it is more about readability.

For instance,

result = PROTECT(allocVector(...))
x0 = allocVector()
SET_VECTOR_ELT(result, 0, x0);
// now x0 is implicitly protected via result
...

x0 = Rf_lengthgets(..)
SET_VECTOR_ELT(result, 0, x0);
/// now the new value of x0 is implicitly protected via result (the old value may not be)

UNPROTECT(1)  // result
return result

return(result);


There is also a mention in WRE of R_PreserveObject and R_ReleaseObject -

looking for advice if this is needed, or which approach is better/more stable 
to replace SETLENGTH?   (I have many many instances that need to be updated, so 
trying to get some clarity here before updating and running code through 
valgrind or other sanitizers to catch any memory issues before submitting an 
update to CRAN.

PreserveObject/ReleaseObject is good e.g. for global structures, probably not in this case. The difficulty there is making sure ReleaseObject() does execute in case of error, a non-local return. On the other hand, protection via PROTECT/UNPROTECT is automatically robust to non-local returns (automatic unprotection).

There is nothing specific about Rf_lengthgets wrt to protection here - the same rules apply to any other R API function that returns an SEXP.

For finding protection bugs in code, one can use an R build with barrier checking enabled and gctorture or rchk tool. Some bugs may lead to crashes or incorrect outputs even in normaln builds. Some bugs may be found by UBSAN. But none of this is a verification tool, one can only find some bugs in some cases, correctness remains the responsibility of the programmer.

Best
Tomas


best,
Merlise







______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Reply via email to