date:20020422

String mortality

2002-04-22 Thread Peter Gibbs


Two more problems found in string.c; these relate to the creation of
temporary strings to hold results of transcoding, in string_concat and
string_compare.

As per the latest (I think) decision from Dan ("Avoiding the deadlands", 9th
April: http://www.mail-archive.com/perl6-internals@perl.org/msg09072.html),
the following patch does the following:

1) Add BUFFER_neonate_FLAG (actually renamed BUFFER_needs_GC_FLAG, since
this is not used at present, and can always be added again if it is needed
in the future) - feel free to change the name to anything you fancy
2) Add neonate counters to interpreter structure (I added three separate
counters; as it is really only required as a flag to indicate that cleanup
is needed, one would probably suffice)
3) Change GC routines to treat 'neonate' string/buffer headers in the same
way as constants
4) Change string_concat and string_compare to set and clear the 'neonate'
flag as required

Still required as per the above-referenced decision:
1) Implement equivalent flag for PMCs (unless the 'immune' flag serves the
same purpose?)
2) Procedure to clear neonate flag on all headers from time to time

Note that this patch gives compiler warnings in string.c because of the
'const' attribute on the parameters, and therefore should not be applied in
its current form; I'm sure somebody can figure out how best to resolve the
warnings.

--
Peter Gibbs
EmKel Systems

Index: include/parrot/interpreter.h
===
RCS file: /home/perlcvs/parrot/include/parrot/interpreter.h,v
retrieving revision 1.40
diff -u -r1.40 interpreter.h
--- include/parrot/interpreter.h 3 Apr 2002 04:01:41 - 1.40
+++ include/parrot/interpreter.h 22 Apr 2002 12:58:47 -
@@ -142,6 +142,9 @@
requests are there? */
 UINTVAL GC_block_level; /* How many outstanding GC block
requests are there? */
+UINTVAL neonate_strings;/* How many protected newborn strings ? */
+UINTVAL neonate_buffers;/* How many protected newborn buffers ? */
+UINTVAL neonate_PMCs;   /* How many protected newborn PMCs ? */
 } Interp;

 #define PCONST(i) PF_CONST(interpreter->code, (i))
Index: interpreter.c
===
RCS file: /home/perlcvs/parrot/interpreter.c,v
retrieving revision 1.84
diff -u -r1.84 interpreter.c
--- interpreter.c 15 Apr 2002 18:05:18 -  1.84
+++ interpreter.c 22 Apr 2002 13:05:56 -
@@ -497,6 +497,9 @@
 interpreter->memory_collected = 0;
 interpreter->DOD_block_level = 1;
 interpreter->GC_block_level = 1;
+interpreter->neonate_strings = 0;
+interpreter->neonate_buffers = 0;
+interpreter->neonate_PMCs = 0;

 /* Set up the memory allocation system */
 mem_setup_allocator(interpreter);
Index: include/parrot/string.h
===
RCS file: /home/perlcvs/parrot/include/parrot/string.h,v
retrieving revision 1.35
diff -u -r1.35 string.h
--- include/parrot/string.h 24 Mar 2002 22:30:06 -  1.35
+++ include/parrot/string.h 22 Apr 2002 12:59:23 -
@@ -65,8 +65,8 @@
 /* Private flag for the GC system. Set if the buffer's in use as
  * far as the GC's concerned */
 BUFFER_live_FLAG = 1 << 12,
-/* Mark the bufffer as needing GC */
-BUFFER_needs_GC_FLAG = 1 << 13,
+/* Mark the bufffer as newborn, for protection from infant death */
+BUFFER_neonate_FLAG = 1 << 13,
 /* Mark the buffer as on the free list */
 BUFFER_on_free_list_FLAG = 1 << 14,
 /* This is a constant--don't kill it! */
Index: resources.c
===
RCS file: /home/perlcvs/parrot/resources.c,v
retrieving revision 1.45
diff -u -r1.45 resources.c
--- resources.c 19 Apr 2002 01:33:56 -  1.45
+++ resources.c 22 Apr 2002 13:00:47 -
@@ -341,7 +314,8 @@
 STRING *string_array = cur_string_arena->start_STRING;
 for (i = 0; i < cur_string_arena->used; i++) {
   /* Tentatively unused, unless it's a constant */
-  if (!(string_array[i].flags & BUFFER_constant_FLAG)) {
+  if (!(string_array[i].flags &
+(BUFFER_constant_FLAG | BUFFER_neonate_FLAG))) {
 string_array[i].flags &= ~BUFFER_live_FLAG;
   }
 }
@@ -353,7 +327,8 @@
 Buffer *buffer_array = cur_buffer_arena->start_Buffer;
 for (i = 0; i < cur_buffer_arena->used; i++) {
   /* Tentatively unused, unless it's a constant */
-  if (!(buffer_array[i].flags & BUFFER_constant_FLAG)) {
+  if (!(buffer_array[i].flags &
+(BUFFER_constant_FLAG | BUFFER_neonate_FLAG))) {
 buffer_array[i].flags &= ~BUFFER_live_FLAG;
   }
 }
Index: string.c
===
RCS file: /home/perlcvs/parrot/string.c,v
retrieving revision 1.73
diff -u -r1.73 string.c
--- string.c  15 Apr 2002 20:34:28

Re: Please rename 'but' to 'has'.

2002-04-22 Thread Aaron Sherman

On Sun, 2002-04-21 at 10:59, Trey Harris wrote:

> 0 has true
> 
> my first reaction would be, "huh?  Since when?"

Dare I say... "now"? ;-)

Sorry, someone had to say it.

Personally, even though it sucks up namespace, I think what we're seeing
here is a need for more than one keyword that are synonyms. "but" and
"now" seem to cover a good deal of ground.

0 now true

Is misleading, IMHO, as 0 is not now true. 0, in this context is an
expression, and we're saying that that expression is now true. "but"
conveys this much more clearly. However, as many have pointed out, there
are a number of cases where but is equally misleading.

Is there any problem with allowing both but and now? It might even be
elegant to use both at the same time:

$x now integer but true

which is clearer to my eye than

$x now integer now true

which seems to change the properties of $x twice without reconciling the
changes with each other.

In any other language this would be unthinkable, but I think it fits
nicely with Perl's philosophy. Not TMTOWTDI, which I think is often used
to excuse the inexcusable, but the idea that Perl reflects the ways in
which humans use language. We want to convey shades of meaning that do
not translate directly to action.

So, have I just lost it, or would it make sense to have now and but?

Apologies to the person who started this thread. I know you thought
"has" was ideal, and I understand why. It's just that between "but" and
"now", I think you get more ground covered than you do with "has" and
either one.

RE: Regex and Matched Delimiters

2002-04-22 Thread Aaron Sherman


On Sat, 2002-04-20 at 05:06, Mike Lambert wrote:
> > He then went on to describe something I didn't understand at all.
> > Sorry.
> 
> Few corrections to what you wrote:
> 
> To avoid the problem of extending {} to support new features with a
> character 'x', without breaking stuff that might have an 'x' immediately
> after the '{', my proposal is to require one space after the { before the
> real regex appears.

I hope that you mean "one or more whitespace characters", not just a
space. The following would be correct, no?

/{|
.*
 }/

Anything else would seem rather confusing to the average Perl
programmer.

no money down idea for computed goto

2002-04-22 Thread Jason Gloudon



I don't have the time right now to do this myself, so here is a simple
idea to evaluate.

Currently, the computed goto decode and dispatch is essentially:

goto *ops_addr[ *cur_opcode ];

Now a big part of the gain of the prederef runops core comes from decoding each
op once instead of each time it is executed.  The prederef core does this by
creating an array shadowing the byte code which stores pointers to the op
functions for the decoded ops.

One could modify the computed goto runops analagously, by creating a parallel
array that stores the decoded label address of each op. Suppose the parallel
array is pointed to by decoded_ops, then op dispatch would then look like :

goto *decoded_ops[ cur_opcode - start_of_bytecode ];

The C compiler might be able to optimize away the explicit subtraction. If not
one can do the equivalent pointer math, but I won't try to write that here.

In the ideal case, where sizeof(opcode_t) == sizeof(void *), one could possibly
cheat like the jit compiler does and overwrite the original bytecode instead of
using a parallel array, but that may not be good.

-- 
Jason

Re: [PATCH] intconst parameter type

2002-04-22 Thread Dan Sugalski


At 12:03 PM +1000 4/19/02, Andrew J Bromage wrote:
>G'day all.
>
>On Thu, Apr 18, 2002 at 09:09:59PM -0400, Dan Sugalski wrote:
>
>>  I've applied this, with the exception of the branch and bsr ops. At
>>  the moment, I agree--I can't see any case where "if" or "gte" needs
>>  to have a variable target. (I can see it for branch, bsr, jump, and
>>  jsr, as those are partially for subroutine dispatch, so no changes
>>  there)
>
>OK, this raises a question: What _is_ the difference between branch and
>jump, or bsr and jsr?  The answer I assumed was that jump/jsr were for
>variable targets and branch/bsr were for static targets.  Is that wrong?

Yup. The branches are relative to the current PC, the jumps take 
absolute addresses.

-- 
 Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
   teddy bears get drunk

Re: Regex and Matched Delimiters

2002-04-22 Thread Aaron Sherman

On Sat, 2002-04-20 at 14:33, Me wrote:

> [2c. What about ( data) or (ops data) normally means non-capturing,
> ($2 data) captures into $2, ($foo data) captures into $foo?]

Very nice (but, I assume you meant {$foo data})! This does add another
special case to the regexp parser's handling of "$", but it seems like
it would be worth it.

Makes me think of the even slightly hairier:

{&foo data}

or even more hair-full:

{&{$foo} data}

for references.

Where you capture into the usual positional, and then invoke foo with
the variable as parameter.

Would be pretty nice closure-wise:

sub match_with_alert($re,$id,$ops,$fac,$pri) {
openlog $id,$ops,$fac;
my $alert = sub ($match) {
syslog $pri, "Matched regexp: $match";
}
return study /{&{$alert} $re}/;
}
my $m = match_with_alert('ROOT login',$0,0,LOG_USER,PRI_CRIT);
for <> -> $_ { /$m/ }

That would certainly be a handy thing that would set Perl apart from the
pack of advanced regexp languages that don't support closures

Some other things come to mind as well, but I'm not sure how evil they
are. For example:

sub decrypt($data is rw) {
$data = rot13($data);
}

print "The secret message is: ", /^Encrypted: {&decrypt .*}/,
  "\n";

Re: Regex and Matched Delimiters

2002-04-22 Thread Me


> Very nice (but, I assume you meant {$foo data})!

I didn't mean that (even if I should have).

Aiui, Mike's final suggestion was that parens end up
doing all the (ops data) tricks, and braces are used
purely to do code insertions. (I really liked that idea.)

So:

Perl 5Perl6
(data)( data)
(?opsdata)(ops data)
({})  {}  


--
ralph

Re: Regex and Matched Delimiters

2002-04-22 Thread Aaron Sherman


On Mon, 2002-04-22 at 14:18, Me wrote:
> > Very nice (but, I assume you meant {$foo data})!
> 
> I didn't mean that (even if I should have).
> 
> Aiui, Mike's final suggestion was that parens end up
> doing all the (ops data) tricks, and braces are used
> purely to do code insertions. (I really liked that idea.)
> 
> So:
> 
> Perl 5Perl6
> (data)( data)
> (?opsdata)(ops data)
> ({})  {}  

I don't like that particular way of looking at things, but either way my
comments about subroutines and closures still holds.

Subroutines...

2002-04-22 Thread Dan Sugalski


Okay, I've been thinking about subroutines lately. A lot. I had 
planned on putting them off a bit until we'd gotten scratchpads and 
globals done, but I thin I'd as soon get this off for discussion, so 
maybe we can have the rough edges worked out by the time we have 
hashes.

Subroutines, generally, are a pain. They carry far more than just a 
pointer to a chunk of bytecode or real code, and because of that the 
simple jsr is just not going to cut it. So it's dead.

For subs, we have to worry about plain subs, subs that capture their 
lexical & global scopes, and subs that capture their stacks.

We also need to know where to enter the sub (coroutines may change 
this), whether the sub's got a native-code component (for XS and 
JITted subs) and what the 'original' starting spot for the sub is in 
case it's been changed by coroutine yielding.

So, with all that, there's just too darned much stuff needed to *not* 
call with a context object of some sort. So we're going to. Here's 
the protocol:

1) Sub calls are made with the call opcode. P0 is the subroutine 
context object. (Which is what we'd get out of the symbol table or 
from a closure creation)

2) On entry to a sub, you always start a new set of stack chunks. 
This'll facilitate continuations.

3) We're having a new rule--you may *not* take a continuation from 
within an opcode function! This is probably one of those "Well, Duh!" 
things but better to have it up front.

4) P1 is the continuation of the caller, *if* it's created. Which it 
doesn't have to be. CallCC fills this in, call doesn't. (Yeah, we're 
turning into Scheme. I'm horrified too)

5) P2 is the current object, also potentially empty, to facilitate 
method calls. (I don't think a method should be able to be a 
continuation, but the very thought of that makes my head hurt enough 
to not be able to think about it clearly)

I think there's more, but that should probably suffice for now. I 
*am* nervous that this is making sub calls more expensive than I'd 
like 'em to be.
-- 
 Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
   teddy bears get drunk

RE: Subroutines...

2002-04-22 Thread Brent Dax


Dan Sugalski:
# Okay, I've been thinking about subroutines lately. A lot. I had 
# planned on putting them off a bit until we'd gotten scratchpads and 
# globals done, but I thin I'd as soon get this off for discussion, so 
# maybe we can have the rough edges worked out by the time we have 
# hashes.
# 
# Subroutines, generally, are a pain. They carry far more than just a 
# pointer to a chunk of bytecode or real code, and because of that the 
# simple jsr is just not going to cut it. So it's dead.
# 
# For subs, we have to worry about plain subs, subs that capture their 
# lexical & global scopes, and subs that capture their stacks.
# 
# We also need to know where to enter the sub (coroutines may change 
# this), whether the sub's got a native-code component (for XS and 
# JITted subs) and what the 'original' starting spot for the sub is in 
# case it's been changed by coroutine yielding.

How about we instead declare that all subs have One True Entry Point,
and the sub does whatever is needed there?  Normal subs can just set up
scoping and jump to the beginning of the sub's body; coroutines retrieve
their context object and use it; XS and JIT call enternative; etc.  That
way we only pay for the overhead on subs that need it.

# So, with all that, there's just too darned much stuff needed to *not* 
# call with a context object of some sort. So we're going to. Here's 
# the protocol:
# 
# 1) Sub calls are made with the call opcode. P0 is the subroutine 
# context object. (Which is what we'd get out of the symbol table or 
# from a closure creation)
# 
# 2) On entry to a sub, you always start a new set of stack chunks. 
# This'll facilitate continuations.
# 
# 3) We're having a new rule--you may *not* take a continuation from 
# within an opcode function! This is probably one of those "Well, Duh!" 
# things but better to have it up front.
# 
# 4) P1 is the continuation of the caller, *if* it's created. Which it 
# doesn't have to be. CallCC fills this in, call doesn't. (Yeah, we're 
# turning into Scheme. I'm horrified too)
# 
# 5) P2 is the current object, also potentially empty, to facilitate 
# method calls. (I don't think a method should be able to be a 
# continuation, but the very thought of that makes my head hurt enough 
# to not be able to think about it clearly)

If you need a continuation, you can just use a closure to generate a
normal but anonymous sub with the object as a lexical, can't you?  That
way a continuation is just an object.  (Of course, I could just be
screwed up--I don't understand continuations well enough to be sure.

# I think there's more, but that should probably suffice for now. I 
# *am* nervous that this is making sub calls more expensive than I'd 
# like 'em to be.

--Brent Dax <[EMAIL PROTECTED]>
@roles=map {"Parrot $_"} qw(embedding regexen Configure)

#define private public
--Spotted in a C++ program just before a #include

Re: [PATCH] intconst parameter type

2002-04-22 Thread Andrew J Bromage


G'day all.

On Thu, Apr 18, 2002 at 09:09:59PM -0400, Dan Sugalski wrote:

> >>  I've applied this, with the exception of the branch and bsr ops.
[...]

On Mon, Apr 22, 2002 at 11:01:35AM -0400, Dan Sugalski wrote:

> The branches are relative to the current PC, the jumps take 
> absolute addresses.

So why do branch/bsr need register targets, as opposed to jump/jsr
which certainly do?

Cheers,
Andrew Bromage

Re: Subroutines...

2002-04-22 Thread Steve Fink

On Tue, Apr 23, 2002 at 09:28:29AM +1000, Andrew J Bromage wrote:
> G'day all.
> 
> On Mon, Apr 22, 2002 at 04:31:32PM -0400, Dan Sugalski wrote:
> 
> > 3) We're having a new rule--you may *not* take a continuation from 
> > within an opcode function! This is probably one of those "Well, Duh!" 
> > things but better to have it up front.
> 
> I see why you say this, but I'm not sure it's necessarily a good idea.
> There are a few languages which rely on continuations within functions
> (Prolog is the one that springs to mind, but there are others), and
> without it, the generated code might get unnecessarily bloated.

That wasn't my understanding. "opcode function" to me means the
internal implementation of a single opcode.

Assuming I am correct, I still don't quite get what the restriction
is. Is this so that the interpreter is holding the whip at the moment
stacks need to be juggled? What does this allow and disallow for
extensions (eg an extension that defines its own opcode?) And what
does this gain over an API entry to tell the interpreter that you're
taking a continuation?

I guess I just don't have enough of a mental model of how parrot will
implement continuations to understand this. Is this similar to the
analogous current situation, where opcodes can't muck with the program
counter without using one of the funky 'goto POP()' family of macros?

Re: Subroutines...

2002-04-22 Thread Andrew J Bromage

G'day all.

On Mon, Apr 22, 2002 at 04:31:32PM -0400, Dan Sugalski wrote:

> 3) We're having a new rule--you may *not* take a continuation from 
> within an opcode function! This is probably one of those "Well, Duh!" 
> things but better to have it up front.

I see why you say this, but I'm not sure it's necessarily a good idea.
There are a few languages which rely on continuations within functions
(Prolog is the one that springs to mind, but there are others), and
without it, the generated code might get unnecessarily bloated.

Cheers,
Andrew Bromage

Re: Please rename 'but' to 'has'.

2002-04-22 Thread Larry Wall


Aaron Sherman writes:
: On Sun, 2002-04-21 at 10:59, Trey Harris wrote:
: 
: > 0 has true
: > 
: > my first reaction would be, "huh?  Since when?"
: 
: Dare I say... "now"? ;-)
: 
: Sorry, someone had to say it.
: 
: Personally, even though it sucks up namespace, I think what we're seeing
: here is a need for more than one keyword that are synonyms. "but" and
: "now" seem to cover a good deal of ground.
: 
: 0 now true
: 
: Is misleading, IMHO, as 0 is not now true. 0, in this context is an
: expression, and we're saying that that expression is now true. "but"
: conveys this much more clearly. However, as many have pointed out, there
: are a number of cases where but is equally misleading.
: 
: Is there any problem with allowing both but and now? It might even be
: elegant to use both at the same time:
: 
: $x now integer but true
: 
: which is clearer to my eye than
: 
: $x now integer now true
: 
: which seems to change the properties of $x twice without reconciling the
: changes with each other.
: 
: In any other language this would be unthinkable, but I think it fits
: nicely with Perl's philosophy. Not TMTOWTDI, which I think is often used
: to excuse the inexcusable, but the idea that Perl reflects the ways in
: which humans use language. We want to convey shades of meaning that do
: not translate directly to action.
: 
: So, have I just lost it, or would it make sense to have now and but?
: 
: Apologies to the person who started this thread. I know you thought
: "has" was ideal, and I understand why. It's just that between "but" and
: "now", I think you get more ground covered than you do with "has" and
: either one.

Perl 6 will try to avoid synonyms but make it easy to declare them.  At
worst it would be something like:

my sub operator:now ($a,$b) is inline { $a but $b }

Larry

Re: Regex and Matched Delimiters

2002-04-22 Thread Larry Wall


Me writes:
: > Very nice (but, I assume you meant {$foo data})!
: 
: I didn't mean that (even if I should have).
: 
: Aiui, Mike's final suggestion was that parens end up
: doing all the (ops data) tricks, and braces are used
: purely to do code insertions. (I really liked that idea.)
: 
: So:
: 
: Perl 5Perl6
: (data)( data)
: (?opsdata)(ops data)
: ({})  {}  

Hmm.  Let me spill a few beans about where I'm going with A5.  I've
been thinking similar thoughts about the problem of overloading parens
so heavily in Perl 5, but I'm going in a slightly different direction
with it.  The basic principles for the new regexen are:

* Parens always capture.
* Braces are always closures.
* Square brackets are always character classes.
* Angle brackets are always metasyntax (along with backslash).

So a first whack at the differences might be:

Old New
--- ---
//  //  ???
?pat?   // or even m ???
/pat/x  /pat/
/^pat$/m/^^pat$$/
/./s// or /<.>/ ???

\p{prop}<+prop>  ???
\P{prop}<-prop>  ???
space(or \h for "horizontal"?)
{n,m}   

\t  also 
\n  also  or  (latter matching logical newline)
\r  also 
\f  also 
\a  also 
\e  also 
\033same
\x1Bsame
\x{263a}\x<263a> ???
\c[ same
\N{name}
\l  same
\u  same
\Lstring\E  \L
\Ustring\E  \U
\E  gone
[\040\t]\h  plus any Unicode horizontal whitespace
[\r\n\ck]   \v  plus any Unicode vertical whitespace

\b  same
\B  same
\A  ^
\Z  same?
\z  $
\G  , but assumed in nested patterns?
 
\1  $1

\Q$var\E$varalways assumed literal, so $1 is literal backref
$var<$var>  assumed to be regex
=~ $re  =~ /<$re>/   ouch?

(??{$rule}) 
(?{ code }) { code } with failure semantics
(?#...) {"..."} :-)
(?:...) <:...>
(?=...) 
(?!...) 
(?<=...)
(?
(?>...) 
(?(cond)t|f)Not sure.  Could just use { if ... }

Obviously the  and  syntaxes will be user extensible.
We have to be able to support full grammars.  I consider it a feature
that  looks like a non-terminal in standard BNF notation.  I do
not consider it a misfeature that  resembles an HTML or XML tag,
since most of those languages need to be matched with a fancy rule
named  anyway.

An interesting idea would be that if you say

m

or

m{code}

it's as if you said

m//

or

m/{code}/

The latter is particularly interesting to me in that I can see uses for
patterns that are Perl code at the top level rather than regex
literal.  Any closure within a regular expression has full access to
the current state object for the match.  So most of the RFCs proposing
ad hoc mechanisms for saving submatches in various kinds of variables
can be handled with closures.

/(...)(...)(...) { @array = .all } /

or

/(...) { $first  = $+ }
 (...) { $second = $+ }
 (...) { $third  = $+ }/

or

/ () () { .node = ["if",$1,$2] } /  # shades of yacc

or whatever.  Could have a <$foo=...> as syntactic sugar, perhaps.
But we need the general mechanism for building up parse trees of
arrays of hashes of arrays of arrays of hashes of arrays of hashes of...

I haven't decided yet whether matches embedded in the closure should
automatically pick up where the outer match is, or whether there should
be some explicit match op to mean that, much like \G only better.  I'm
thinking when the current topic is a match state, we automatically
continue where we left off, and require explicit =~ to start an unrelated
match.

I also haven't committed to any particular mechanism for defining a
set of related rules in a grammar.  Obviously it needs to be a good
enough mechanism to parse Perl and its variants, which means it
probably needs to be OO based, and you make new grammars by derivation
from the base grammar and overriding the rules you want to change.

Sorry if this is a bit delirious--I'm fighting off some kind of
infection, and my nights have been shortchanged lately by the
neighborhood panhandler who doesn't seem to understand either
complicated concepts like "bedtime" or simple concepts like "no".

Larry

Re: Regex and Matched Delimiters

2002-04-22 Thread Luke Palmer


> (?=...)   
> (?!...)   
> (?<=...)  
> (?
> (?>...)   

Yummy :)
I'd say this is about perfect. The look(ahead|behind)s, er, 
look<:ahead|behind>s are used seldom enough that this is practical. And 
it's I much clea[nr]er than that (?=...) crap. (Think I'm going 
overboard with this tregext?)

And are you going to reveal the method by which you define your own 
s, so we can overload it with personal ungrounded opinions? (On the 
other hand, it'd probably just stick and not move, because you said it.)

> Sorry if this is a bit delirious--I'm fighting off some kind of
> infection, and my nights have been shortchanged lately by the
> neighborhood panhandler who doesn't seem to understand either
> complicated concepts like "bedtime" or simple concepts like "no".

bed...what?


Luke

RE: Regex and Matched Delimiters

2002-04-22 Thread Brent Dax


Larry Wall:
# Me writes:
# : > Very nice (but, I assume you meant {$foo data})!
# : 
# : I didn't mean that (even if I should have).
# : 
# : Aiui, Mike's final suggestion was that parens end up
# : doing all the (ops data) tricks, and braces are used
# : purely to do code insertions. (I really liked that idea.)
# : 
# : So:
# : 
# : Perl 5Perl6
# : (data)( data)
# : (?opsdata)(ops data)
# : ({})  {}  
# 
# Hmm.  Let me spill a few beans about where I'm going with A5. 
#  I've been thinking similar thoughts about the problem of 
# overloading parens so heavily in Perl 5, but I'm going in a 
# slightly different direction with it.  The basic principles 
# for the new regexen are:
# 
# * Parens always capture.
# * Braces are always closures.
# * Square brackets are always character classes.
# * Angle brackets are always metasyntax (along with backslash).
# 
# So a first whack at the differences might be:
# 
# Old   New
# ---   ---
# ////  ???
# ?pat? // or even m ???

Whoa, those are moving to the front?!?

# /pat/x/pat/
# /^pat$/m  /^^pat$$/

That's...odd.  Is $$ (the variable) going away?

# /./s  // or /<.>/ ???

I think that . is too common a metacharacter to be relegated to this.

# \p{prop}  <+prop>  ???
# \P{prop}  <-prop>  ???

Intriguing.

# space  (or \h for "horizontal"?)

Same thinking as '.'.

# {n,m} 

Ah, OK.

# \talso 
# \nalso  or  (latter matching
logical newline)
# \ralso 
# \falso 
# \aalso 
# \ealso 

I can tell you right now that these are going to screw people up.
They'll try to use these in normal strings and be confused when it
doesn't work.  And you probably won't be able to emit a warning,
considering how much CGI Perl munches.

# \033  same
# \x1B  same
# \x{263a}  \x<263a> ???

Why?  Wouldn't we want the same thing to work in quoted strings?  (Or
are those changing syntaxes too?)

# \c[   same
# \N{name}  
# \lsame
# \usame
# \Lstring\E\L
# \Ustring\E\U

So that's changed from whenever you talked about \q{} ?

# \Egone
# [\040\t]  \hplus any Unicode horizontal whitespace
# [\r\n\ck] \v  plus any Unicode vertical whitespace
# 
# \bsame
# \Bsame

# \A^
# \Zsame?
# \z$

Are you sure that optimizes for the common case?

# \G, but assumed in nested patterns?
#  
# \1$1
# 
# \Q$var\E  $varalways assumed literal, so $1 is literal
backref

So these are reinterpolated every time you backtrack?  Are you *trying*
to destroy regex performance?  :^)

# $var  <$var>  assumed to be regex

What if $var is a qr//ed object?

# =~ $re=~ /<$re>/   ouch?

I don't see the win.

# (??{$rule})   
# (?{ code })   { code } with failure semantics
# (?#...)   {"..."} :-)
# (?:...)   <:...>
# (?=...)   
# (?!...)   
# (?<=...)  
# (?

Cute.  (Wait a minute, aren't those reversed?)

# (?>...)   
# (?(cond)t|f)  Not sure.  Could just use { if ... }

?

# Obviously the  and  syntaxes will be user 
# extensible. We have to be able to support full grammars.  I 
# consider it a feature that  looks like a non-terminal in 
# standard BNF notation.  I do not consider it a misfeature 
# that  resembles an HTML or XML tag, since most of those 
# languages need to be matched with a fancy rule named  anyway.

But that *does* make it harder to define the fancy rules.  I could see
someone defining rules like:

'gt' => qr/\ qr/\>/

just to get around backslashing everything in sight.

# An interesting idea would be that if you say
# 
# m
# 
# or
# 
# m{code}
# 
# it's as if you said
# 
# m//
# 
# or
# 
# m/{code}/

I don't know about that one.  I often use {} as delimiters on regexen
because it's a character that doesn't occur in data very often.  I think
the gain of two characters isn't as critical as the loss of options.
 
Understand, I'm not a regex Luddite.  I've been working with yacc and
lex a lot lately, so I have at least a hint of how powerful formal
parsing is--and I love all of these features.  However, I think that
syntactically a l

String mortality

Re: Please rename 'but' to 'has'.

RE: Regex and Matched Delimiters

no money down idea for computed goto

Re: [PATCH] intconst parameter type

Re: Regex and Matched Delimiters

Re: Regex and Matched Delimiters

Re: Regex and Matched Delimiters

Subroutines...

RE: Subroutines...

Re: [PATCH] intconst parameter type

Re: Subroutines...

Re: Subroutines...

Re: Please rename 'but' to 'has'.

Re: Regex and Matched Delimiters

Re: Regex and Matched Delimiters

RE: Regex and Matched Delimiters

17 matches

Site Navigation

Mail list logo

Footer information