Re: [Rd] Custom C finalizers for .Call

Simon Urbanek Wed, 25 Nov 2015 08:01:52 -0800

Jeroen,

I agree that it may be useful to have some kind of "finally"-like 
infrastructure. However, in the use cases you list there are already ways to do 
that - the same way that R_alloc uses. First, you don't need to call UNPROTECT 
- the whole point is that the protection stack is automatically popped in case 
of an abnormal termination. That is precisely how memory leaks are prevented - 
as long as you play by the rules it will be released for you. The check for 
UNPROTECTs at the end of .Call is explicitly there so catch bugs in *normal* 
termination. So no memory leaks there.

The real case where users may create leaks is if you allocate memory that you 
don't tell R about. As long as you associate a finalizer with any non-R memory 
you allocate, there will be no memory leaks -that is how you are supposed to 
write R packages with external allocations. So the only difference to your 
example is that you don't register a finalizer with the *function* but rather 
with the *allocation* you make. That also seems IMHO less error prone.

So to take your example, the way would typically write that code safely is 
something like

typedef struct {
  xmlNodePtr *node;
 EVP_PKEY_CTX *ctx;
} my_context_t;

// define how to dispose of all things you care about correctly
static void context_fin(SEXP what) {
    my_context_t *c = (my_context_t*) EXTPTR_PTR(what);
    if (!c) return;
    if (c->ctx) EVP_PKEY_CTX_free(c->ctx);
    if (c->node) xmlFreeNode(c->node);
}

[...]
// allocate the context and tell R to manage its protection and finalization
// (you could write a macro to make this one-liner)
my_context_t* c = (my_context_t*) R_Calloc(1, my_context_t);
SEXP res = PROTECT(R_MakeExternalPtr(c, R_NilValue, R_NilValue));
R_RegisterCFinalizer(res, context_fin);

// do all work here ... you safely abort at any point without memory leaks
c->node =  xmlNewNode(...);
c->ctx = EVP_PKEY_CTX_new(...);
[...]

The point of using a finalizer is that no matter what happens the memory is 
always released. The structure with all allocations is protected until you 
unprotect it or there is any interrupt/error. Since all regular R rules apply, 
you can also assign it someplace to make the protection dependent on any other 
object you care about. This is often useful because you don't need to PROTECT 
things left and right, but instead you can just have one object that holds 
references to random things you care about.

Of course, you could write a wrapper for the above with some syntactic sugar to 
achieve the same - essentially limiting the finalizer to be just a function 
call on the reference that you create. It may be a bit of overkill since you 
may end up creating objects for every allocation, but certainly doable. I would 
argue that in most cases you already tend to have a structure for the things 
you allocate so the "normal" approach is typically more clear and readable than 
inlining calls with side-effects, but that may be a matter of taste.

Cheers,
Simon

On Nov 23, 2015, at 6:10 PM, Jeroen Ooms <[email protected]> wrote:

> WRE explains that R_alloc() can be used to allocate memory which
> automatically gets released by R at the end of a .C, .Call or
> .External, even in the case of an error or interruption. This is a
> really great feature to prevent memory leaks. I was wondering if there
> is a way to extend this mechanism to allow for automatically running
> UNPROTECT and custom finalizers at the end of a .Call as well.
> 
> Currently it is all to easy for package authors to introduce a memory
> leak or stack imbalance by calling Rf_error() or
> R_CheckUserInterrupt() in a way that skips over the usual cleanup
> steps. This holds especially for packages interfacing C libraries
> (libcurl, libxml2, openssl, etc) which require xx_new() and xx_free()
> functions to allocate/free various types of objects, handles and
> contexts. Therefore we cannot use R_alloc() and we need to manually
> clean up when returning, which is tricky for irregular exits.
> 
> Moreover package authors might benefit from an alternative of
> allocVector() which automatically protects the SEXP until the .Call is
> done. Perhaps I don't fully appreciate the complexity of the garbage
> collector, but one could imagine a variant of PROTECT() which
> automatically keeps a counter 'n' for the number of allocated objects
> and makes R run UNPROTECT(n) when .Call exists, along with releasing
> the R_alloc() memory. Yes, there are cases where it is useful to have
> manual control over what can be collected earlier during the .Call
> procedure, but these are rare. A lot of C code in packages might
> become safer and cleaner if authors would have an option to let this
> be automated.
> 
> The most general feature would a hook for adding custom C functions to
> the .Call exit, similar to on.exit() in R:
> 
>  xmlNodePtr *node =  xmlNewNode(...);
>  Rf_on_exit(xmlFreeNode, node);
>  EVP_PKEY_CTX *ctx = EVP_PKEY_CTX_new(...);
>  Rf_on_exit(EVP_PKEY_CTX_free, ctx);
>  SEXP out = PROTECT(allocVector(...));
>  Rf_on_exit(UNPROTECT, 1);
> 
> I don't know R's internals well enough to estimate if something like
> this would be possible. I did put together a simple C example of a
> linked list with object pointers and their corresponding free
> functions, which can easily be free'd with a single call:
> http://git.io/vBqRA . So basically what is mostly missing at this
> point is a way to trigger this at the end of the .Call in a way that
> works for regular returns, errors and interruptions...
> 
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Custom C finalizers for .Call

Reply via email to