On Fri, Nov 04, 2005 at 11:31:59AM -0800, Larry Wall wrote: > On Fri, Nov 04, 2005 at 09:53:07AM -0600, Patrick R. Michaud wrote: > : Quick summary: I'm thinking that \n should be defined as > : the equivalent of > : > : rule nl { [ \015\012 | <[\015\012\f\x85\x{2028}\x{2029}]> ]: } > > That seems like a reasonable first approximation to me. The main thing > will be to make sure we keep things consistent between rules and > filehandles that do autochomping. One approach would be to make > autochomping always use rules to recognize newlines, but that might > well be something that a filehandle would want to optimize.
Well, even PGE doesn't really use a rule for this, it just does a direct check for any of the above. Since there's only one out-of-the-ordinary case (the \015\012) it's not too hard to implement. > : I'm of the opinion that the sequence "\015\012" should always > : be treated as a single newline ... > > Seems fine to me, unless it makes lots of programs run twice as slow, > which I tend to doubt. In the current implementation of PGE, it's only slower for programs that are directly quantifying \n somehow, such as \n+, \n*, or \n**{1..5}, and I suspect that's not often. > : With this, the definition of \N is simply any character that > : is not in the set [\012\015\x0c\x85\x{2028}\x{2029}]. > > Er, yes. Whatever a "character" is... Sorry, I lapsed into "character"-speak there at the end of my message. But yes, PGE just looks at match positions in an underlying target without regard to whether the things in the target are bytes, graphemes, characters, or some other unit of abstraction. Thanks for the quick confirmation, \n and \N have now been implemented in r9774. Pm