Re: The meaning of \n and \N in rules

Patrick R. Michaud Fri, 04 Nov 2005 13:01:57 -0800

On Fri, Nov 04, 2005 at 11:31:59AM -0800, Larry Wall wrote:
> On Fri, Nov 04, 2005 at 09:53:07AM -0600, Patrick R. Michaud wrote:
> : Quick summary:  I'm thinking that \n should be defined as 
> : the equivalent of
> : 
> :     rule nl { [ \015\012 | <[\015\012\f\x85\x{2028}\x{2029}]> ]: }
> 
> That seems like a reasonable first approximation to me.  The main thing
> will be to make sure we keep things consistent between rules and
> filehandles that do autochomping.  One approach would be to make
> autochomping always use rules to recognize newlines, but that might
> well be something that a filehandle would want to optimize.


Well, even PGE doesn't really use a rule for this, it just does
a direct check for any of the above.  Since there's only one 
out-of-the-ordinary case (the \015\012) it's not too hard to 
implement.

> : I'm of the opinion that the sequence "\015\012" should always
> : be treated as a single newline ...
> 
> Seems fine to me, unless it makes lots of programs run twice as slow,
> which I tend to doubt.

In the current implementation of PGE, it's only slower for programs
that are directly quantifying \n somehow, such as \n+, \n*, or \n**{1..5}, 
and I suspect that's not often.  

> : With this, the definition of \N is simply any character that
> : is not in the set [\012\015\x0c\x85\x{2028}\x{2029}].
> 
> Er, yes.  Whatever a "character" is...

Sorry, I lapsed into "character"-speak there at the end of my
message.  But yes, PGE just looks at match positions in an 
underlying target without regard to whether the things in the 
target are bytes, graphemes, characters, or some other unit of 
abstraction. 

Thanks for the quick confirmation, \n and \N have now been
implemented in r9774.

Pm

Re: The meaning of \n and \N in rules

Reply via email to