RE: parrot rx engine

2002-02-04 Thread Hong Zhang
> Agh, if you go and do that, you must then be sure that rx is capable of > optimizing /a/i and /[aA]/ in the same way. What I mean is that Perl's > current regex engine is able to use /abc/i as a "constant" in a string, > while it cannot do the same for /[Aa][Bb][Cc]/. Why? Because in the > fi

RE: parrot rx engine

2002-02-02 Thread Jeff 'japhy' Pinyan
On Jan 31, Hong Zhang said: >> But as you say, case folding is expensive. And with this approach you >> are going to case-fold every string that is matched against an rx >> that has some part of it that is case-insensitive. > >That is correct in general. But regex compiler can be smarter than tha

Re: parrot rx engine

2002-01-31 Thread Tim Bunce
On Thu, Jan 31, 2002 at 12:50:52PM -0800, Brent Dax wrote: > > Let me know if I'm brilliant, on crack, or both with this idea. I've no idea :-) Tim.

RE: parrot rx engine

2002-01-31 Thread Ashley Winters
--- Brent Dax <[EMAIL PROTECTED]> wrote: > Tim Bunce: > # On Thu, Jan 31, 2002 at 05:15:49PM +, Graham Barr wrote: > # > # Especially as the perl6 rx engine will have to be able to > # work directly on > # non-trivial things like streams and generators ans suchlike. > > I have a suggestion si

RE: parrot rx engine

2002-01-31 Thread Brent Dax
Tim Bunce: # On Thu, Jan 31, 2002 at 05:15:49PM +, Graham Barr wrote: # > # > Yes, I was assuming that. However what is to be gained by case # > folding the input string ? # > # > Because parts of an rx can be case-insensitive while other parts # > are case-sensitive, we will probably need two

RE: parrot rx engine

2002-01-31 Thread Hong Zhang
> But as you say, case folding is expensive. And with this approach you > are going to case-fold every string that is matched against an rx > that has some part of it that is case-insensitive. That is correct in general. But regex compiler can be smarter than that. For example, rx should optimize

Re: parrot rx engine

2002-01-31 Thread Graham Barr
On Thu, Jan 31, 2002 at 11:18:58AM -0800, Hong Zhang wrote: > > Because parts of an rx can be case-insensitive while other parts > > are case-sensitive, we will probably need two sorts of ops anyway > > (or a way to tell the op to be case-insensitive). And you will > > only be able to do the case

RE: parrot rx engine

2002-01-31 Thread Hong Zhang
> Because parts of an rx can be case-insensitive while other parts > are case-sensitive, we will probably need two sorts of ops anyway > (or a way to tell the op to be case-insensitive). And you will > only be able to do the case folding when the whole rx is > case-insensitive. I don't like you

Re: parrot rx engine

2002-01-31 Thread Tim Bunce
On Thu, Jan 31, 2002 at 05:15:49PM +, Graham Barr wrote: > > Yes, I was assuming that. However what is to be gained by case > folding the input string ? > > Because parts of an rx can be case-insensitive while other parts > are case-sensitive, we will probably need two sorts of ops anyway >

Re: parrot rx engine

2002-01-31 Thread Graham Barr
On Thu, Jan 31, 2002 at 08:54:21AM -0800, Brent Dax wrote: > Peter Haworth: > # On Wed, 30 Jan 2002 17:45:58 +, Graham Barr wrote: > # > On Wed, Jan 30, 2002 at 09:32:49AM -0800, Brent Dax wrote: > # > > # rx_setprops P0, "i", 2 > # > > # branch $start0 > # > >

RE: parrot rx engine

2002-01-31 Thread Brent Dax
Peter Haworth: # On Wed, 30 Jan 2002 17:45:58 +, Graham Barr wrote: # > On Wed, Jan 30, 2002 at 09:32:49AM -0800, Brent Dax wrote: # > > # rx_setprops P0, "i", 2 # > > # branch $start0 # > > # $advance: # > > # rx_advance P0, $fail # > >

Re: parrot rx engine

2002-01-31 Thread Peter Haworth
On Wed, 30 Jan 2002 17:45:58 +, Graham Barr wrote: > On Wed, Jan 30, 2002 at 09:32:49AM -0800, Brent Dax wrote: > > # rx_setprops P0, "i", 2 > > # branch $start0 > > # $advance: > > # rx_advance P0, $fail > > # $start0: > > #

Re: parrot rx engine

2002-01-30 Thread Bryan C. Warnock
On Wednesday 30 January 2002 21:42, Dan Sugalski wrote: > I think we may want trees as a fundamental data type at some point... I wonder about the trees -- Bryan C. Warnock [EMAIL PROTECTED]

Re: parrot rx engine

2002-01-30 Thread Dan Sugalski
At 6:28 PM -0800 1/30/02, Steve Fink wrote: >I'm sure in Apoc 5 Larry's going to go way beyond that and embed full >parsers, not just regularish language matchers, but the above is >easier to grasp. Odds are, yes. And don't be surprised if the RE engine's required to return data structures as we

Re: parrot rx engine

2002-01-30 Thread Steve Fink
On Wed, Jan 30, 2002 at 08:37:30PM -0500, Bryan C. Warnock wrote: > "But if you know they're going to be twenty times slower, why are you doing > it?" Because we know / think / hope / pray / have been making sacrifices to Tangential note: current benchmarking indicates that we're doing a lot b

Re: parrot rx engine

2002-01-30 Thread Bryan C. Warnock
On Wednesday 30 January 2002 11:13, Ashley Winters wrote: > First, we set the rx engine to case-insensitive. Why is that bad? It's > setting a runtime property for what should be compile-time {snip} > Now, the current CVS rx engine is/would do this at runtime. We're also currently a compiler sh

Re: parrot rx engine

2002-01-30 Thread Bryan C. Warnock
On Wednesday 30 January 2002 12:32, Brent Dax wrote: > # Mostly, I'd like to hear how either Unicode character-ranges aren't > # deterministic at compile-time (I doubt that) or how crippling to > > One word: locale. Not that locales couldn't provide pre-compiled character classes. -- Bryan C. W

Re: parrot rx engine

2002-01-30 Thread Graham Barr
On Wed, Jan 30, 2002 at 09:32:49AM -0800, Brent Dax wrote: > # rx_setprops P0, "i", 2 > # branch $start0 > # $advance: > # rx_advance P0, $fail > # $start0: > # rx_literal P0, "a", $advance > # > # First, we set the rx

Re: parrot rx engine

2002-01-30 Thread Simon Cozens
begin quote from Ashley Winters: > I think that's exactly what you should be doing! Neither parrot nor the > rx engine should try to be a full compiler. The rx engine definitely > should have opcodes in the virtual machine, but those opcodes should > simply contain state-machine/backtracking info,

RE: parrot rx engine

2002-01-30 Thread Brent Dax
Ashley Winters: # Who the hell am I? # I've been only a weblog-lurker till now. It's been a couple # years since # I last contributed to Perl5. I just read the latest Apocalypse and it # inspired me to get a parrot snapshot and look around. Welcome back to the land of the living. :^) # What's m

RE: parrot rx engine

2002-01-30 Thread Angel Faus
Ashley Winters wrote: >First, we set the rx engine to case-insensitive. Why is that bad? It's >setting a runtime property for what should be compile-time >unicode-character-kung-fu. Assuming your "CPU" knows what the gritty >details of unicode in the first place just feels wrong, but I digress.

Re: parrot rx engine

2002-01-30 Thread Melvin Smith
>Basically, I see a black-box being built in the interests of speed. >Voodoo array formats, bitmaps, and other such things to avoid actually >spelling out what the regular expression is doing *in parrot code*. [snip] >What I see is that rx_literal is a speed hack to avoid compiling this >into par

Re: parrot rx engine

2002-01-30 Thread Jonathan Scott Duff
On Wed, Jan 30, 2002 at 08:13:55AM -0800, Ashley Winters wrote: > I think that's exactly what you should be doing! Neither parrot nor the > rx engine should try to be a full compiler. The rx engine definitely > should have opcodes in the virtual machine, but those opcodes should > simply contain s