On 08/11/2008 7:20 AM, John Wiedenhoeft wrote:
Hi there,
I rejoiced when I realized that you can use Perl regex from within R. However,
as the FAQ states "Some functions, particularly those involving regular
expression matching, themselves use metacharacters, which may need to be
escaped by the backslash mechanism. In those cases you may need a quadruple
backslash to represent a single literal one. "
I was wondering if that is really necessary for perl=TRUE? wouldn't it be
possible to parse a string differently in a regex context, e.g. automatically
insert \\ for each \ , such that you can use the perl syntax directly? For
example, if you want to input a newline as a character, you would use \n
anyway. At the moment one says \\n to make it clear to R that you mean \n to
make clear that you mean newline... this is pretty annoying. How likely is it
that you want to pass a real newline character to PCRE directly?
No, that's not possible. At the level where the parsing takes place R
has no idea of its eventual use, so it can't tell that some strings are
going to be interpreted as Perl, and others not.
As Gabor mentioned, there have been various discussions of adding a new
syntax for strings that are parsed literally, without processing any
escapes, but no consensus on the right syntax to use.
There are currently some fragile tricks that let you avoid escapes, e.g.
using scan() to read a line:
> re <- scan(what="", n=1)
1: [^\\]
Read 1 item
> re
[1] "[^\\\\]"
(I call this fragile because it works in scripts processed at console
level, but not if you type the same thing into a function.)
So I agree, it would be nice to have new syntax to allow this. Last
time this came up, I argued for something like \verb in LaTeX where the
delimiter could be specified differently in each use. Duncan TL
suggested triple quotes, as in Python. I think now that triple quotes
would be be better than the particular form I suggested.
Duncan Murdoch
If it's anyhow possible to pass everything between " and " directly to PCRE
without expanding it internally in R, please add this to a future version (as
an option like noescape=TRUE perhaps?)! I would love to use R instead of Perl
for working with regex, without having to do two levels of escape all the
time.
Thanks,
John
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.