Perl 6 summary for week ending 2002-09-29
The Perl 6 Summary for the Week Ending 20020929 Okay, this is my last summary before I take a couple of week's holiday away from any form of connectivity. Will I cope? Can my system stand going cold turkey? Can you live without my summaries? Luckily, Leon Brocard has been volunteered to step into the breach and produce summaries for the next couple of weeks. Oh yes, due to my being a lazy swine and not reading release notes, combined with a new version of Spamassassin no longer delivering mail by default (now it silently drops mail on the floor in cases where it had previously just delivered the mail), I may be missing some messages from this week. Sorry. We'll kick off, as usual with happenings on the internal list: Of Variables, Values and Vtables Dan stopped travelling (for a while at least), and listed the current short term goals for Parrot. They are: * Finish up the calling convention changes * Spec the PMC changes * Spec the vtable changes * Get exceptions fully defined and a preliminary implementation and promised the variable/vtable stuff in the `next day or so', with the calling convention stuff a little earlier or later. Leo Toetsch offered some his thoughts on vtable methods for _keyed opcodes. http://makeashorterlink.com/?Z31D146F1 http://makeashorterlink.com/?L22D226F1 IMCC 0.0.9.2 Leopold Toetsch provided a patch which `fixes all currently known problems [with respect to] IMCC/Perl6'. Andy Dougherty had some problems with the patch dumping core, possibly because of platform specific issues, and Steve Fink realised that there was an overlap between this patch and one he'd been working on. The patch has not yet been applied, but work continued. http://makeashorterlink.com/?L23D326F1 Fun with intlists Leopold Toetsch showed some benchmarks of intlist against PerlArrays, the difference is stunning. The intlist based test is some ten times faster than PerlArray, with most of PerlArray's time being spent allocating memory. Leo suggests using intlist as the PerlArray base class. Having got bragging rights for one speed up, Leo sent in a second patch which gave *another* ten fold performance boost. Sean O'Rourke had a few questions about performance in typical usage and wondered if, we shouldn't look at using borrowing from SGI's STL implementation of a dequeue (double ended queue). Leo was ahead of him there; his second patch was already using the trick Sean had suggested. http://makeashorterlink.com/?E34D126F1 http://makeashorterlink.com/?L25D516F1 Functions in Scheme Jürgen B"ouml"mmels sent a pre patch which gets Scheme functions working. It's built on top of an early version of Sean O'Rourke's scratchpad.pmc, so be careful applying the initial patch. Sean hoped that it would be be easy to reconcile Jürgen's changes to the scratchpad pmc with the changes he'd made since he sent Jürgen his early code. Jonathan Sillito asked why the scheme interpreter maintained its own environment stack rather than use the "pad_stack". Apparently the current pad_stack is very closely tied to Sub.pmc, which doesn't quite offer the semantics needed for scheme functions. Also, the pad_stack makes it tricky to implement "set!" and "define" correctly. Dan chimed in asking everyone to hash out what they needed from scratchpads and lexical variables; once we have that nailed down it should be easy to get everything designed and implemented reasonably quickly, so Jürgen and Sean came up with a list between them. http://makeashorterlink.com/?O16D116F1 -- The patch http://makeashorterlink.com/?B27D126F1 -- Its description Perl6 on HP-UX 11.00 H Merijn Brand was having trouble getting Perl 6 to work on HP-UX. It was initially thought that this was a problem with the version of perl he was using, but was eventually tracked down to a problem with "make test"; the tests passed when Merijn did "perl6 --test". However the thread also covered making sure that the Perl6 build process rebuilt the Grammar if appropriate. There's also a theory that there's a problem with IMCC generating .pasm files. Leopold Toetsch put his hand up for causing the problem, and submitted a patch to fix things. Applied. http://makeashorterlink.com/?S18D256F1 http://makeashorterlink.com/?O39D216F1 The status of Leopold Toetsch's patches Leo wondered what's happening with the pile of patches he's submitted this week. At the time he made the post, he had 15 patches outstanding (or is that `outstanding patches'?) and, as a result several of the patches were applied. Steve Fink voted that Leo should be given commit access to CVS and Leo was grateful for the vote of confidence. Leo later sent in yet another patch for intlist, whic
Re: Interfaces
On Monday, September 30, 2002, at 05:23 PM, Michael G Schwern wrote: > OTOH, Java interfaces have a loophole which is considered a design > mistake. > An interface can declare some parts of the interface optional and then > implementors can decide if they want to implement it or not. The > upshot > being that if you use a subclass in Java you can't rely on the optional > parts being there. > > This comes down to an OO philosophy issue. If Perl 6 wants a strict OO > style, don't put in a loophole. If they want to leave some room to > play, > put in the ability to turn some of the strictness off. I guess what bothers me is the loophole issue, sort of... in specific, who gets to decide whether a given interface method is optional. I'm hoping that the optional-ness of an interface is itself optional. And that the optional-ness of a non-optional interface is not optional. :-) The problems arise anywhere you're referring to interface methods from outside the class. It may be perfectly OK for a subclass of a given class to reuse an interface method for a different purpose, etc., so long as nothing outside the inheritance chain is using it... but I'd very much like a way for an interface to say "don't muck with me", so that sensitive interfaces can be guaranteed as forever invariant, if you really, really mean it. My hope is that you could define the "non-overridability" of a given interface method in the parent class, to guarantee enforcement among subclasses. Possible pseudocode choices include: # (1) explicitly not an interface method method foo (...) is private { ... } # (2) explicitly an interface method method foo (...) is interface { ... } # (3) explicitly a "strict" interface, can't change it in subclasses method foo (...) is strict interface { ... } # (4) explicitly "optional", subclasses can muck with it method foo (...) is optional interface { ... } # (5) So what's the unattributed case? An implied interface, # implied optional interface, or just explicitly not private? method foo (...) { ... } I'd say if we had (1), and if (2) enforced "strict", then (5) could mean "optional, not private, but not strictly an interface either...", and we wouldn't need the icky (3) or (4) at all. That would go nicely with expectations, I think. (But are there other possibilities besides "private" and "interface", e.g. "protected", etc.?) On Monday, September 30, 2002, at 06:04 PM, David Whipp wrote: >>> What if a subclass adds extra, optional arguments to a >>> method, is that ok? >> >> ... In theory, yes... > > I don't think that the addition of an optional parameter > violates any substitution principle: users of the base-class > interface couldn't use the extra params (because they're not in > the interface); but a user of the derived-class's interface > can use the extra power (because they are in that interface). > A derived class is always allowed to add things (thus, you can > weaken preconditions, stengthen postconditions, add extra > methods, return a more specific result, ...; but you can't > strengthen a precondtion, nor weaken a postcondition, etc.) Agreed. If we had some concept like "strict" vs. "overridable" interfaces, should "strict" prevent this, too, or are extra, optional parameters always allowed as a special case (under the assumption that they can't hurt anything that doesn't know about them?) >> if our interface "a" is returning an object, of a class that flattens >> itself >> differently in different contexts, then do we say the interface can >> only return object classes derived from that first object class? >> And do we restrict the possible "flattenings" of the object class >> itself, >> using an interface, so subclasses of the returned obj can't muck with >> it and unintentionally violate our first interface ("a")?... My musing is that the behavior of a class in different contexts is itself an interface, in the sense of being a contract between a class/subclass and it's users, and therefore could be expressible in the same syntax as other interfaces, with the same issues of (non)strictness. (And for that matter, the contextual return values of any arbitrary function, foo(), can be expressed as an interface to a one-off class named "foo", which implies large commonalities of syntax/implementation between classes, methods, operators, and ordinary functions [1]... hee hee...) Mike Lazzaro Cognitivity (http://www.cognitivity.com/) [1] ...which implies that all Perl6 operators and functions can (must?) be implemented as flyweight classes with interfaces that define their properties/attributes/arguments/contexts, but that's a Perl6 <--> Parrot thing.
exegesis 5 question: matching negative, multi-byte strings
I was wondering what the favored syntax in perl6 would be to match negative multi-byte strings. In perl 5: $sql = "select * from a where b union select * from c where d"; my $nonunion = "[^u]|u[^n]|un[^i]|uni[^o]|unio[^n]"; my (@subsqls) = ($sql =~ m"((?:$nonunion)*"); guaranteeing that the subsqls have all text up to, but not including the string "union". I suppose I could say: rule nonunion { (.*) :: { fail if ($1 =~ m"union$"); } } although that seems awful slow, and I suppose I that I could do the same thing in perl6 as I did in perl5, although that gets ugly if you need to combine matching strings without "union" in them with, say parens: rule parens { \* [ <-[()]> + : | ]* \) } rule non_union_non_parens { [< -[()u] > | u< -[()n] > | un < -[()i] > | uni < -[()o] > | unio < -[()n] > ] } my (@subsqls) = ($sql =~ m" ([ | ]*) "); And finally, I suppose I could write a sql grammar (which for this application, and most) is definitely overkill. So I guess I'd like something shorter, something where you could say: < -["union"] > or < -["union"\(\)] > or < -["union""select"\(\)] > a generic negative, multi-byte string matching mechanism. Any thoughts? Am I missing something already present or otherwise obvious? Ed
Re: Interfaces
On Tue, Oct 01, 2002 at 11:51:02AM -0700, Michael Lazzaro wrote: > >This comes down to an OO philosophy issue. If Perl 6 wants a strict OO > >style, don't put in a loophole. If they want to leave some room to > >play, > >put in the ability to turn some of the strictness off. > > I guess what bothers me is the loophole issue, sort of... in specific, > who gets to decide whether a given interface method is optional. If we do it loosely, the subclasser decides if they want to follow the interface, since most violations of an interface are done because it's being used in an unforseen manner. But they have to explicitly say they're violating it. I can't see any good reason why an interface author would want to make their interface optional. If we do it strictly, interfaces are not optional. Perhaps a way to sharpen the focus on this is to expand the discusson of strictness to include not just method prototypes but Design-By-Contract features as well (pre and post conditions and invariants). Should DBC conditions be overridable? Since it's not terribly useful to override a signature only to be stopped by a pre-condition. Taken as a whole, I'm leaning towards no. Interfaces and conditions should be strict. They can be gotten around using delegation, which should be built into Perl 6 anyway. > >I don't think that the addition of an optional parameter > >violates any substitution principle: users of the base-class > >interface couldn't use the extra params (because they're not in > >the interface); but a user of the derived-class's interface > >can use the extra power (because they are in that interface). > >A derived class is always allowed to add things (thus, you can > >weaken preconditions, stengthen postconditions, add extra > >methods, return a more specific result, ...; but you can't > >strengthen a precondtion, nor weaken a postcondition, etc.) > > Agreed. If we had some concept like "strict" vs. "overridable" > interfaces, should "strict" prevent this, too, or are extra, optional > parameters always allowed as a special case (under the assumption that > they can't hurt anything that doesn't know about them?) Unless someone can come up with a practical case of adding parameters which violates the interface, I'd say there's no problem, strict or no strict. > My musing is that the behavior of a class in different contexts is > itself an interface, in the sense of being a contract between a > class/subclass and it's users Ah HA! Contract! Return values can be enforce via a simple DBC post condition, no need to invent a whole new return value signature. -- Michael G. Schwern <[EMAIL PROTECTED]>http://www.pobox.com/~schwern/ Perl Quality Assurance <[EMAIL PROTECTED]> Kwalitee Is Job One And if you don't know Which To Do Of all the things in front of you, Then what you'll have when you are through Is just a mess without a clue. Of all the best that can come true If you know What and Which and Who.
exegesis 5 question: matching negative, multi-byte strings
> [Negative matching] > a generic negative, multi-byte string matching mechanism. Any thoughts? > Am I missing something already present or otherwise obvious? Maybe I'm misundertanding the question, but I think you want negative lookahead: Perl 5: /(.*)(?!>union)/ Perl 6: /(.*) / Luke
Re: Interfaces
In a message dated Mon, 30 Sep 2002, Michael G Schwern writes: > On Mon, Sep 30, 2002 at 06:04:28PM -0700, David Whipp wrote: > > On a slightly different note, if we have interfaces then I'd really > > like to follow the Eiffel model: features such as renaming methods > > in the derived class may seem a bit strange; but they can be useful > > if you have have name-conflicts with multiple inheritance. > > I'm not familiar with the Eiffel beyond "it's the DBC language and it's > French", but wouldn't this simply be covered by aliasing? No, because this only gives you a second name for the method, it does not obliterate the meaning of the first. The example Damian uses in his OO class (when he discusses Class::Delegation) is a Car which inherits from Vehicle (which has an action C, which causes the car to move) and also inherits from MP3_Player (which has an accessor C which sets which media spindle to use). It's incorrect to use just the C method, it's incorrect to use just the C method, it's incorrect to call both (vehicular acceleration would cause the MP3 player to change disks). You need both capabilities, but you need them separately. You want something like class Car is Vehicle renames(drive => accel) is MP3_Player renames(drive => mp3_drive); Either of those renamings is, of course, optional, in which case drive() refers to the non-renamed one when referring to a Car object. But later on, if some code does Vehicle $mover = getNext(); # returns a Car $mover.drive(5); It should call C on C<$mover>, that is, C<$mover.accel()>. See why aliasing doesn't solve this? It can get more complicated, too. Say you want to do this (I don't know if this will be possible, but it could be): class DoublyLinkedList is LinkedList is LinkedList renames($.head => $.tail, nextNode => prevNode Node::$.next => Node::$.prev); What's going on here? We're inheriting from C twice. The first time, we just accept it wholesale, including its inner Node class which contains a $.data and a $.next reference. The second time, we rename the $.head reference to $.tail (along with its associated method head() to tail()), we rename its nextNode() method to prevNode(), and we rename *its* version of the inner Node class to make $.next into $.prev. The Node::$.data attribute is still shared. Redefine insert() and delete() so that it deals with both the next node and the previous node, and you're done. If some $variable of type LinkedList was assigned a DoublyLinkedList, a call to C<$variable.nextNode> would call the un-redefined nextNode, which is correct. If you redefined *both* Cs, you'd probably call the first one in that situation. But who knows Supporting repeated inheritance and multiple inheritance with partial redefinition opens a huge ball of wax as far as complicated inheritance rules that I don't know Damian or Larry have any interest in fleshing out, but it could be made to work. Try *that* with aliasing. Trey
Re: exegesis 5 question: matching negative, multi-byte strings
On Tue, Oct 01, 2002 at 01:24:45PM -0600, Luke Palmer wrote: > > > [Negative matching] > > > a generic negative, multi-byte string matching mechanism. Any thoughts? > > Am I missing something already present or otherwise obvious? > > Maybe I'm misundertanding the question, but I think you want negative > lookahead: > > Perl 5: /(.*)(?!>union)/ > Perl 6: /(.*) / > > Luke no, that doesn't work, because of the way regexes operate. The '.*' captures everything, and since the string after everything (ie: the end of the string) doesn't match 'union', the regex succeeds without backtracking. Try it: perl -e ' $a = "this has the string union in it"; my ($b) = ($a =~ m"(.*)(?!>union)"); print $b;' prints: this has the string union in it not 'this has the string'. Ed
Re: exegesis 5 question: matching negative, multi-byte strings
On Tue, Oct 01, 2002 at 12:47:24PM -0700, [EMAIL PROTECTED] wrote: > On Tue, Oct 01, 2002 at 01:24:45PM -0600, Luke Palmer wrote: > > > > > [Negative matching] > > > > > a generic negative, multi-byte string matching mechanism. Any thoughts? > > > Am I missing something already present or otherwise obvious? > > > > Maybe I'm misundertanding the question, but I think you want negative > > lookahead: > > > > Perl 5: /(.*)(?!>union)/ > > Perl 6: /(.*) / > > > > Luke > > no, that doesn't work, because of the way regexes operate. The '.*' captures > everything, and since the string after everything (ie: the end of the string) > doesn't match 'union', the regex succeeds without backtracking. Try it: I think what you want is just a negated assertion: /+/ Although I don't know what that means exactly. Does it match 5 characters at a time that aren't "union" or does it match one character at a time as long as the string "union" isn't matched at that point? -Scott -- Jonathan Scott Duff [EMAIL PROTECTED]
Re: Interfaces
On Tue, Oct 01, 2002 at 03:43:22PM -0400, Trey Harris wrote: > You want something like > > class Car is Vehicle renames(drive => accel) > is MP3_Player renames(drive => mp3_drive); > > Either of those renamings is, of course, optional, in which case drive() > refers to the non-renamed one when referring to a Car object. > > But later on, if some code does > > Vehicle $mover = getNext(); # returns a Car > $mover.drive(5); > > It should call C on C<$mover>, that is, > C<$mover.accel()>. > > See why aliasing doesn't solve this? Ahh, because Perl has to know that when $mover is used as a Vehicle it uses Car.accel but when used as an MP3_Player it calls Car.mp3_drive. Clever! -- Michael G. Schwern <[EMAIL PROTECTED]>http://www.pobox.com/~schwern/ Perl Quality Assurance <[EMAIL PROTECTED]> Kwalitee Is Job One If I got something to say, I'll say it with lead. -- Jon Wayne
Re: exegesis 5 question: matching negative, multi-byte strings
[EMAIL PROTECTED] (Jonathan Scott Duff) writes: > I think what you want is just a negated assertion: > > /+/ > > Although I don't know what that means exactly. That matches more than one thing that is not the string "union". "u" is not the string "union"; "n" is not the string "union"... I think /(.*) / may do it. -- There is no distinction between any AI program and some existent game.
Re: Interfaces
On Tuesday, October 1, 2002, at 12:33 PM, Michael G Schwern wrote: > Perhaps a way to sharpen the focus on this is to expand the discusson > of > strictness to include not just method prototypes but Design-By-Contract > features as well (pre and post conditions and invariants). Should DBC > conditions be overridable? Since it's not terribly useful to override > a > signature only to be stopped by a pre-condition. > > Taken as a whole, I'm leaning towards no. Interfaces and conditions > should > be strict. They can be gotten around using delegation, which should be > built into Perl 6 anyway. I'd think no, too... if someone doesn't want or need interfaces, they can just not use them. Which implies, I assume, that "interface" is not the default state of a class method, e.g. we do need something like "method foo() is interface { ... }" to declare any given method specifically as an interface method, if noone has a problem with that. Just to be clear, I'm not thinking we can get away with saying "all nonprivate methods are automatically interfaces", for example. >> My musing is that the behavior of a class in different contexts is >> itself an interface, in the sense of being a contract between a >> class/subclass and it's users > > Ah HA! Contract! Return values can be enforce via a simple DBC post > condition, no need to invent a whole new return value signature. I think I get it, but can you give some pseudocode? If you want a method to return a list of Zoo animals in "list" context, and a Zoo object in "Zoo object" context, what would that look like? (I'm assuming that DBC postconditions on a method would be treated, internally, as part of the overall signature/prototype of the method: i.e. if you override the method in a subclass, all original postconditions would still remain attached to it (though the new method might itself add additional postconditions.)) MikeL
Re: exegesis 5 question: matching negative, multi-byte strings
> guaranteeing that the subsqls have all text up to, but not including the string > "union". > > I suppose I could say: > > rule nonunion { (.*) :: { fail if ($1 =~ m"union$"); } } What's wrong with: ? rule getstuffbeforeunion { (.*?) union | (.*) } "a union" => "a " "b" => "b" Am I missing something here? Mike Lambert
Re: Interfaces
> On Tue, Oct 01, 2002 at 03:43:22PM -0400, Trey Harris wrote: >> You want something like >> >> class Car is Vehicle renames(drive => accel) >> is MP3_Player renames(drive => mp3_drive); I *really* like this, but would the above be better coded as: class Car is Vehicle renames(drive => accel) has MP3_Player renames(drive => mp3_drive); implying a "container" relationship with automatic delegation? Among the other considerations is that if you simply said class Car is Vehicle has MP3_Player; the inheritance chain could assume that Car.drive === Vehicle.drive, because is-a (inheritance) beats has-a (containment or delegation). If you needed to, you should still be able to call $mycar.MP3_Player.drive to DWYM, too. Along these lines, I'd love the ability to do something like: class Bird is Animal has (LeftWing is Wing) # a "named" Wing has (RightWing is Wing) has (LeftLeg is Leg) has (RightLeg is Leg); $bird.LeftWing.flap;# makes sense $bird.flap; # but what's this do? left, right, or _both_? $bird^.Wing.flap# perhaps too evil? :-) MikeL
Re: Interfaces
On Monday, September 30, 2002, at 11:19 PM, Michael G Schwern wrote: > On Mon, Sep 30, 2002 at 06:04:28PM -0700, David Whipp wrote: >> On a slightly different note, if we have interfaces then I'd really >> like to follow the Eiffel model: features such as renaming methods >> in the derived class may seem a bit strange; but they can be useful >> if you have have name-conflicts with multiple inheritance. > > I'm not familiar with the Eiffel beyond "it's the DBC language and it's > French", but wouldn't this simply be covered by aliasing? Eiffel can either rename a "feature"(method, attribute), which is pretty much the same as aliasing as you might see it in Ruby, or you can redefine the method entirely. Again, you also would see this in Ruby, which might be more approachable for those familiar with Perl. class BAR inherit FOO rename output as old_output end end or... class BAR inherit FOO redefine output end end
Re: exegesis 5 question: matching negative, multi-byte strings
On Tue, 2002-10-01 at 15:24, Luke Palmer wrote: > Maybe I'm misundertanding the question, but I think you want negative > lookahead: > > Perl 5: /(.*)(?!>union)/ You really meant to say Perl 5: /((?:(?!union).))*/ # Match characters that do not begin the word 'union' Right? Peter Behroozi
Re: Paren madness (was Re: Regex query)
David Whipp wrote: > $b = 7, 6, 5 > @b = 7, 6, 5 I understand that C's *interpretation* of the comma operator will be expunged from Perl 6. But unless comma's *precedence* is also changing, neither of those statements would build a list with three elements. It seems to me that $b = 7, 6, 5; is the same as ($b = 7), 6, 5; not $b = (7, 6, 5); because '=' binds tighter than ','. So it will assign 7 to $b, and then effectively evaluate the statement 7, 6, 5; which might build a list and then discard it. I.e., it is akin to these statements: [7, 6, 5]; 3 + 4; 7; (and equally feckless). =thom
Re: Interfaces
On Tuesday, October 1, 2002, at 02:49 PM, Michael Lazzaro wrote: > Which implies, I assume, that "interface" is not the default state of > a class method, e.g. we do need something like "method foo() is > interface { ... }" to declare any given method Flippin' hell, never mind. You're almost certainly talking about a style like: interface Vehicle { method foo () { ... } method bar () { ... } } - or - class Vehicle is interface { ... } in which case an "interface" is specified as a type of abstract class, not an attribute of a given method... I was thinking of something like class Vehicle { method foo () is interface { ... } method bar () is interface { ... } method zap () is private { ... } } in which a specific base class could define "obligatory" method signatures for any eventual subclasses. Never mind on that one, I've been thinking too much about a different problem. MikeL
Re: exegesis 5 question: matching negative, multi-byte strings
On Tue, Oct 01, 2002 at 06:32:07PM -0400, Mike Lambert wrote: > > guaranteeing that the subsqls have all text up to, but not including the string > > "union". > > > > I suppose I could say: > > > > rule nonunion { (.*) :: { fail if ($1 =~ m"union$"); } } > > What's wrong with: ? > > rule getstuffbeforeunion { (.*?) union | (.*) } > > "a union" => "a " > "b" => "b" > > Am I missing something here? > > Mike Lambert > hmm... well, it works, but its not very efficient. It basically scans the whole string to the end to see if there is a "union" string, and then backtracks to take the alternative. And hence, its not very scalable. It also doesn't 'complexify' very well. Suppose you had a long string of text, and you wanted to 'harden' your regex against the substring union appearing in double-quoted strings, single-quoted strings, etc. etc, without writing a sql parser. I just don't see how to do this with ? - I would do something like (taking a page from Mr. Friedl's book ) - rule regex_matching_sql { [ <-[u()"']>+ : | : | : | : | ]* } rule parens { \( [ <-["'()]>+ : | : | : | ]* \) } rule single_string { \' [ <-[\'\\]>+ : | \.\' ]* \' } rule double_string { \" [ <-[\"\\]>+ : | \.\" ]* \" } rule non_union { [ u < - ['"()n] > | un ... | uni ... | unio ... | u$ ] * } Of course I could also be missing something, but I just don't see how to do this with .*?. Ed (ps: As for: /(.*) / I'm not sure how that works; and whether or not its very 'complexifiable' (as per above) . If it does a match against every single substring (take all characters, look for union, if it exists, roll back a character, do the same thing, etc. etc. etc.) then this isn't good enough. The non_union rule listed above is about as efficient as it can get; it does no backtracking, and it keeps the common matches up front so they match first without alternation. )
Re: exegesis 5 question: matching negative, multi-byte strings
On Tue, Oct 01, 2002 at 05:24:43PM -0400, Peter Behroozi wrote: > On Tue, 2002-10-01 at 16:44, [EMAIL PROTECTED] wrote: > > doesn't work (just tried it out, not sure why it doesn't) but even if it did, > > it would be awful slow. It would try one character, look at the next for the > > string union, come back for the next character, look for the string union, > > etc. etc. etc. > > > > whereas > > > > ([^u]+|u[^n]) > > > > doesn't do any backtracking at all.. > > > > Ed > > perl -e ' $a = "this has the string union in it"; > my ($b) = ($a =~ m"((?:(?!union).)*)"); print $b;' > > prints the desired result for me at least. It also should be comparably whoops. Must have mistyped. Works for me now. > efficient to the alternation since the match for the string 'union' > should fail if the first character is not 'u', etc. The alternation > also matches a character at a time except in special cases, where I am > reasonably sure that the extra overhead from alternation compensates for > multi-character matching. This method also does no backtracking for the > provided example; I am not sure what made you think that it did. > > Peter > well, when I said backtracking, I meant it didn't flip between the current character and the next. I couldn't check real numbers doing benchmarking because the ?! construct core dumps on both perl-5.6.1 and perl-5.8 on large strings. But when benchmarked on small (30 line strings) using: my $regex1 = qr{(?:(?!union).)*}sx; my $regex2 = qr{(?:[^u]+|u[^n]|un[^i]|uni[^o]|unio[^n])*}sx; timethese (10, { 'questionbang' => sub { my ($b) = ($line =~ m"($regex1)"); }, 'alternation' => sub { my ($b) = ($line =~ m"($regex2)"); } } ); I get: Benchmark: timing 10 iterations of alternation, questionbang... alternation: 11 wallclock secs (10.66 usr + 0.00 sys = 10.66 CPU) @ 9380.86/s (n=10) questionbang: 18 wallclock secs (18.81 usr + 0.00 sys = 18.81 CPU) @ 5316.32/s (n=10) so ?! is a bit slower. It could probably be made faster though. However, I'm still skeptical as it being a good replacement for the alternation. Look at my posted message (about making the regex be able to handle nested parens, etc) and see if you can come up with an easy way handle the case I mentioned there.. Ed