Re: [Rd] Check for protection

Tomas Kalibera Fri, 11 Apr 2025 09:38:52 -0700

On 4/11/25 17:39, Duncan Murdoch wrote:

On a tangent from the main topic of this thread: sometimes(especially to non-experts) it's not obvious whether a variable isprotected or not.
I don't think there's any easy way to determine that, but perhapsthere should be. Would it be possible to add a run-time test youcould call in C code (e.g. is_protected(x)) that would do the samesearch the garbage collector does in order to determine if aparticular pointer is protected?
This would be an expensive operation, similar in cost to actuallydoing a garbage collection. You wouldn't want to do it routinely, butit would be really helpful in debugging.

I've experimented with some things like that in the past and concludedthey were not that useful.

Learning that a value is not protected at certain point in the programdoesn't necessarily mean this is a bug - it depends whether that valuewill be exposed to a possible garbage collection. It is perfectly finethat an unprotected value is returned from a C function (and this is howit should be). It is fine when an unprotected value exists before it ispassed to say SET_VECTOR_ELT().

So, right, one might ask if a specific value would be later exposed to agarbage collection unprotected (leaving to the tool when such collectionwould happen). But then, it may be ok, because when such a garbagecollection happens, it would be clear the value cannot be used anymore.It only matters if such a value is then being used.

And then: a value may be protected by coincidence, by something that isnot safe to rely on. Such as the example of the caching of a value in aglobal variable: when we ask whether it is protected, it may be that ithappens to be protected by some inconsequential call on the stack, butwe should not rely on that.

We have gc torture with the strict barrier checking, which allows todetect use of a value that has been in fact garbage collected. Also, onecan use the strict barrier checking and manually place calls to gc atcertain points of interest (though, the danger is one places it where itactually cannot happen). These runtime solutions can't find all possibleproblems nor would they tell one what should actually be protected where.

And we have rchk, a static analysis tool, which can direct one close towhere the problems occur, and works based on the rules how protectionshould be done. It is faster, but, it will have false alarms.

The rules for how to protect objects in Writing R Extensions should bequite clear and easy to follow, and certainly it is fine and appropriateto ask for help on this list given a small C example. I think the biggerproblem is when one knows the rules, tries to follow them, but simplyforgets/makes a mistake at some point. And for that, we have thechecking tools mentioned. UBSAN also sometimes can spot some of theseproblems.


Best
Tomas

Duncan Murdoch
On 2025-04-11 6:05 a.m., Suharto Anggono Suharto Anggono via R-develwrote:
On second thought, I wonder if the caching in my changed'StringFromLogical' in my previous message is safe. While 'ans' inthe C function 'coerceToString' is protected, its element is alsoprotected. If the object corresponding to 'ans' is then no longerprotected, is it possible for the cached object 'TrueCh' or 'FalseCh'in 'StringFromLogical' to be garbage collected? If it is, I think ofclearing the cache for each first filling. For example, by abusing'warn' argument, the following is added to my changed'StringFromLogical'.
  if (*warn) TrueCh = FalseCh = NULL;

Correspondingly, in 'coerceToString',

  warn = i == 0;

is inserted before

  SET_STRING_ELT(ans, i, StringFromLogical(LOGICAL_ELT(v, i), &warn));

for LGLSXP case.

---------------------
On Thursday, 10 April 2025 at 10:54:03 pm GMT+7, Martin Maechler<[email protected]> wrote:
Suharto Anggono Suharto Anggono via R-devel
     on Thu, 10 Apr 2025 07:53:04 +0000 (UTC) writes:
> Chain of calls of C functions in coerce.c foras.character(<logical>) in R:
     > do_asatomic
     > ascommon
     > coerceVector
     > coerceToString
     > StringFromLogical (for each element)

     > The definition of 'StringFromLogical' in coerce.c :
> Chain of calls of C functions in coerce.c foras.character(<logical>) in R:
     >
     > do_asatomic
     > ascommon
     > coerceVector
     > coerceToString
     > StringFromLogical (for each element)
     >
     > The definition of 'StringFromLogical' in coerce.c :
     >
     > attribute_hidden SEXP StringFromLogical(int x, int *warn)
     > {
     >    int w;
     >    formatLogical(&x, 1, &w);
     >    if (x == NA_LOGICAL) return NA_STRING;
     >    else return mkChar(EncodeLogical(x, w));
     > }
     >
     > The definition of 'EncodeLogical' in printutils.c :
     >
     > const char *EncodeLogical(int x, int w)
     > {
     >    static char buff[NB];
> if(x == NA_LOGICAL) snprintf(buff, NB, "%*s", min(w,(NB-1)), CHAR(R_print.na_string));
     >    else if(x) snprintf(buff, NB, "%*s", min(w, (NB-1)), "TRUE");
     >    else snprintf(buff, NB, "%*s", min(w, (NB-1)), "FALSE");
     >    buff[NB-1] = '\0';
     >    return buff;
     > }
     >
     > > L <- sample(c(TRUE, FALSE), 10^7, replace = TRUE)
     > > system.time(as.character(L))
     >    user  system elapsed
     >    2.69    0.02    2.73
     > > system.time(c("FALSE", "TRUE")[L+1])
     >    user  system elapsed
     >    0.15    0.04    0.20
     > > system.time(c("FALSE", "TRUE")[L+1L])
     >    user  system elapsed
     >    0.08    0.05    0.13
     > > L <- rep(NA, 10^7)
     > > system.time(as.character(L))
     >    user  system elapsed
     >    0.11    0.00    0.11
     > > system.time(c("FALSE", "TRUE")[L+1])
     >    user  system elapsed
     >    0.16    0.06    0.22
     > > system.time(c("FALSE", "TRUE")[L+1L])
     >    user  system elapsed
     >    0.09    0.03    0.12
     >
     > `as.character` of a logical vector that is all NA is fast enough.
> It appears that the call to 'formatLogical' inside > the Cfunction
     > 'StringFromLogical' does not introduce much    > slowdown.
> I found that using string literal inside the C function'StringFromLogical', by replacing
     > EncodeLogical(x, w)
     > with
     > x ? "TRUE" : "FALSE"
> (and the call to 'formatLogical' is not needed anymore), makeit faster.
indeed! ... and we also notice that the 'w' argument is neither
needed anymore, and that makes sense: At this point when you
know you have a an R logical value there are only three
possibilities and no reason ever to warn about the conversion.

     > Alternatively,
or in addition !
> "fast path" could be introduced in 'EncodeLogical',potentially also benefits format() in R. > For example, without replacing existing code, the followingfragment could be inserted.
     >
> if(x == NA_LOGICAL) {if(w == R_print.na_width) returnCHAR(R_print.na_string);}
     >    else if(x) {if(w == 4) return "TRUE";}
     >    else {if(w == 5) return "FALSE";}
     >
> However, with either of them, c("FALSE", "TRUE")[L+1L] isstill faster than as.character(L) .
     >
> Precomputing or caching possible results of the C function'StringFromLogical' allows as.character(L) to be as fast asc("FALSE", "TRUE")[L+1L] in R. For example, 'StringFromLogical' couldbe changed to
     >
     > attribute_hidden SEXP StringFromLogical(int x, int *warn)
     > {
     >    static SEXP TrueCh, FalseCh;
     >    if (x == NA_LOGICAL) return NA_STRING;
> else if (x) return TrueCh ? TrueCh : (TrueCh =mkChar("TRUE"));
     >    else return FalseCh ? FalseCh : (FalseCh = mkChar("FALSE"));

     > }
Indeed, and something along this line (storing the other two constantstrings) was also
my thought when seeing the
   mkChar(x ? "TRUE" : "FALSE)
you implicitly proposed above.

I'm looking into applying both speedups;
thank you very much, Suharto!

Martin


--
Martin Maechler
ETH Zurich  and  R Core team
       [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Check for protection

Reply via email to