Laurent,
> On Dec 14, 2019, at 5:29 PM, Laurent Gautier <lgaut...@gmail.com> wrote: > > Hi Simon, > > Widespread errors would have caught my earlier as the way that code is > using only one initialization of the embedded R, is used quite a bit, and > is covered by quite a few unit tests. This is the only situation I am aware > of in which an error occurs. > It may or may not be "widespread" - almost all R API functions can raise errors (e.g., unable to allocate). You'll only find out once they do and that's too late ;). > What is a "correct context", or initial context, the code should from ? > Searching for "context" in the R-exts manual does not return much. > It depends which embedded API use - see R-ext 8.1 the two options are run_Rmainloop() and R_ReplDLLinit() which both setup the top-level context with SETJMP. If you don't use either then you have to use one of the advanced R APIs that do it such as R_ToplevelExec() or R_UnwindProtect(), otherwise your point to abort to on error doesn't exist. Embedding R is much more complex than many think ... Cheers, Simon > Best, > > Laurent > > > Le sam. 14 déc. 2019 à 12:20, Simon Urbanek <simon.urba...@r-project.org> a > écrit : > >> Laurent, >> >> the main point here is that ParseVector() just like any other R API has to >> be called in a correct context since it can raise errors so the issue was >> that your C code has a bug of not setting R correctly (my guess would be >> your'e not creating the initial context necessary in embedded R). There are >> many different errors, your is just one of many that can occur - any R API >> call that does allocation (and parsing obviously does) can cause errors. >> Note that this is true for pretty much all R API functions. >> >> Cheers, >> Simon >> >> >> >>> On Dec 14, 2019, at 11:25 AM, Laurent Gautier <lgaut...@gmail.com> >> wrote: >>> >>> Le lun. 9 déc. 2019 à 09:57, Tomas Kalibera <tomas.kalib...@gmail.com> a >>> écrit : >>> >>>> On 12/9/19 2:54 PM, Laurent Gautier wrote: >>>> >>>> >>>> >>>> Le lun. 9 déc. 2019 à 05:43, Tomas Kalibera <tomas.kalib...@gmail.com> >> a >>>> écrit : >>>> >>>>> On 12/7/19 10:32 PM, Laurent Gautier wrote: >>>>> >>>>> Thanks for the quick response Tomas. >>>>> >>>>> The same error is indeed happening when trying to have a zero-length >>>>> variable name in an environment. The surprising bit is then "why is >> this >>>>> happening during parsing" (that is why are variables assigned to an >>>>> environment) ? >>>>> >>>>> The emitted R error (in the R console) is not a parse (syntax) error, >> but >>>>> an error emitted during parsing when the parser tries to intern a name >> - >>>>> look it up in a symbol table. Empty string is not allowed as a symbol >> name, >>>>> and hence the error. In the call "list(''=1)" , the empty name is what >>>>> could eventually become a name of a local variable inside list(), even >>>>> though not yet during parsing. >>>>> >>>> >>>> Thanks Tomas. >>>> >>>> I guess this has do with R expressions being lazily evaluated, and names >>>> of arguments in a call are also part of the expression. Now the puzzling >>>> part is why is that at all part of the parsing: I would have expected >>>> R_ParseVector() to be restricted to parsing... Now it feels like >>>> R_ParseVector() is performing parsing, and a first level of evalution >> for >>>> expressions that "should never work" (the empty name). >>>> >>>> Think of it as an exception in say Python. Some failures during parsing >>>> result in an exception (called error in R and implemented using a long >>>> jump). Any time you are calling into R you can get an error; out of >> memory >>>> is also signalled as R error. >>>> >>> >>> >>> The surprising bit for me was that I had expected the function to solely >>> perform parsing. I did expect an exception (and a jmp smashing the stack) >>> when the function concerned is in the C-API, is parsing a string, and is >>> using a parameter (pointer) to store whether parsing was a failure or a >>> success. >>> >>> Since you are making a comparison with Python, the distinction I am >> making >>> between parsing and evaluation seem to apply there. For example: >>> >>> ``` >>>>>> import parser >>>>>> parser.expr('1+') >>> Traceback (most recent call last): >>> File "<stdin>", line 1, in <module> >>> File "<string>", line 1 >>> 1+ >>> ^ >>> SyntaxError: unexpected EOF while parsing >>>>>> p = parser.expr('list(""=1)') >>>>>> p >>> <parser.st at 0x7f360e5329f0> >>>>>> eval(p) >>> Traceback (most recent call last): >>> File "<stdin>", line 1, in <module> >>> TypeError: eval() arg 1 must be a string, bytes or code object >>> >>>>>> list(""=1) >>> File "<stdin>", line 1 >>> SyntaxError: keyword can't be an expression >>> ``` >>> >>> >>>> There is probably some error in how the external code is handling R >>>>> errors (Fatal error: unable to initialize the JIT, stack smashing, >> etc) >>>>> and possibly also how R is initialized before calling ParseVector. >> Probably >>>>> you would get the same problem when running say "stop('myerror')". >> Please >>>>> note R errors are implemented as long-jumps, so care has to be taken >> when >>>>> calling into R, Writing R Extensions has more details (and section 8 >>>>> specifically about embedding R). This is unlike parse (syntax) errors >>>>> signaled via return value to ParseVector() >>>>> >>>> >>>> The issue is that the segfault (because of stack smashing, therefore >>>> because of what also suspected to be an incontrolled jump) is happening >>>> within the execution of R_ParseVector(). I would think that an issue >> with >>>> the initialization of R is less likely because the project is otherwise >>>> used a fair bit and is well covered by automated continuous tests. >>>> >>>> After looking more into R's gram.c I suspect that an execution context >> is >>>> required for R_ParseVector() to know to properly work (know where to >> jump >>>> in case of error) when the parsing code decides to fail outside what it >>>> thinks is a syntax error. If the case, this would make R_ParseVector() >>>> function well when called from say, a C-extension to an R package, but >> fail >>>> the way I am seeing it fail when called from an embedded R. >>>> >>>> Yes, contexts are used internally to handle errors. For external use >>>> please see Writing R Extensions, section 6.12. >>>> >>> >>> I have wrapped my call to R_ParseVector() in a R_tryCatchError(), and >> this >>> is seems to help me overcome the issue. Thanks for the pointer. >>> >>> Best, >>> >>> >>> Laurent >>> >>> >>>> Best >>>> Tomas >>>> >>>> >>>> Best, >>>> >>>> Laurent >>>> >>>>> Best, >>>>> Tomas >>>>> >>>>> >>>>> We are otherwise aware that the error is not occurring in the R >> console, >>>>> but can be traced to a call to R_ParseVector() in R's C API:( >>>>> >> https://github.com/rpy2/rpy2/blob/master/rpy2/rinterface_lib/_rinterface_capi.py#L509 >>>>> ). >>>>> >>>>> Our specific setup is calling an embedded R from Python, using the cffi >>>>> library. An error on end was the first possibility considered, but the >>>>> puzzling specificity of the error (as shown below other parsing errors >> are >>>>> handled properly) and the difficulty tracing what is in happening in >>>>> R_ParseVector() made me ask whether someone on this list had a >> suggestion >>>>> about the possible issue" >>>>> >>>>> ``` >>>>> >>>>>>>> import rpy2.rinterface as ri>>> ri.initr()>>> e = >> ri.parse("list(''=1+") >> ---------------------------------------------------------------------------RParsingError >> Traceback (most recent call last)>>> e = >> ri.parse("list(''=123") R[write to console]: Error: attempt to use >> zero-length variable name >>>>> R[write to console]: Fatal error: unable to initialize the JIT >>>>> >>>>> *** stack smashing detected ***: <unknown> terminated >>>>> ``` >>>>> >>>>> >>>>> Le lun. 2 déc. 2019 à 06:37, Tomas Kalibera <tomas.kalib...@gmail.com> >> a >>>>> écrit : >>>>> >>>>>> Dear Laurent, >>>>>> >>>>>> could you please provide a complete reproducible example where parsing >>>>>> results in a crash of R? Calling parse(text="list(''=123") from R >> works >>>>>> fine for me (gives Error: attempt to use zero-length variable name). >>>>>> >>>>>> I don't think the problem you observed could be related to the memory >>>>>> leak. The leak is on the heap, not stack. >>>>>> >>>>>> Zero-length names of elements in a list are allowed. They are not the >>>>>> same thing as zero-length variables in an environment. If you try to >>>>>> convert "lst" from your example to an environment, you would get the >>>>>> error (attempt to use zero-length variable name). >>>>>> >>>>>> Best >>>>>> Tomas >>>>>> >>>>>> >>>>>> On 11/30/19 11:55 PM, Laurent Gautier wrote: >>>>>>> Hi again, >>>>>>> >>>>>>> Beside R_ParseVector()'s possible inconsistent behavior, R's handling >>>>>> of >>>>>>> zero-length named elements does not seem consistent either: >>>>>>> >>>>>>> ``` >>>>>>>> lst <- list() >>>>>>>> lst[[""]] <- 1 >>>>>>>> names(lst) >>>>>>> [1] "" >>>>>>>> list("" = 1) >>>>>>> Error: attempt to use zero-length variable name >>>>>>> ``` >>>>>>> >>>>>>> Should the parser be made to accept as valid what is otherwise >> possible >>>>>>> when using `[[<` ? >>>>>>> >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Laurent >>>>>>> >>>>>>> >>>>>>> >>>>>>> Le sam. 30 nov. 2019 à 17:33, Laurent Gautier <lgaut...@gmail.com> a >>>>>> écrit : >>>>>>> >>>>>>>> I found the following code comment in `src/main/gram.c`: >>>>>>>> >>>>>>>> ``` >>>>>>>> >>>>>>>> /* Memory leak >>>>>>>> >>>>>>>> yyparse(), as generated by bison, allocates extra space for the >> parser >>>>>>>> stack using malloc(). Unfortunately this means that there is a >> memory >>>>>>>> leak in case of an R error (long-jump). In principle, we could >> define >>>>>>>> yyoverflow() to relocate the parser stacks for bison and allocate >> say >>>>>> on >>>>>>>> the R heap, but yyoverflow() is undocumented and somewhat >> complicated >>>>>>>> (we would have to replicate some macros from the generated parser >>>>>> here). >>>>>>>> The same problem exists at least in the Rd and LaTeX parsers in >> tools. >>>>>>>> */ >>>>>>>> >>>>>>>> ``` >>>>>>>> >>>>>>>> Could this be related to be issue ? >>>>>>>> >>>>>>>> Le sam. 30 nov. 2019 à 14:04, Laurent Gautier <lgaut...@gmail.com> >> a >>>>>>>> écrit : >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> The behavior of >>>>>>>>> ``` >>>>>>>>> SEXP R_ParseVector(SEXP, int, ParseStatus *, SEXP); >>>>>>>>> ``` >>>>>>>>> defined in `src/include/R_ext/Parse.h` appears to be inconsistent >>>>>>>>> depending on the string to be parsed. >>>>>>>>> >>>>>>>>> Trying to parse a string such as `"list(''=1+"` sets the >>>>>>>>> `ParseStatus` to incomplete parsing error but trying to parse >>>>>>>>> `"list(''=123"` will result in R sending a message to the console >>>>>> (followed but a crash): >>>>>>>>> >>>>>>>>> ``` >>>>>>>>> R[write to console]: Error: attempt to use zero-length variable >>>>>> nameR[write to console]: Fatal error: unable to initialize the JIT*** >> stack >>>>>> smashing detected ***: <unknown> terminated >>>>>>>>> ``` >>>>>>>>> >>>>>>>>> Is there a reason for the difference in behavior, and is there a >>>>>> workaround ? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> >>>>>>>>> Laurent >>>>>>>>> >>>>>>>>> >>>>>>> [[alternative HTML version deleted]] >>>>>>> >>>>>>> ______________________________________________ >>>>>>> R-devel@r-project.org mailing list >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >> >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel