Dan Sugalski: # At 4:17 PM -0800 3/17/02, Steve Fink wrote: # >On Sat, Mar 16, 2002 at 04:34:34PM -0500, Dan Sugalski wrote: # >> Now's your time to speak up, please. # > # >Ok, you asked for it. I just committed the regular expression # >compiler. It has known bugs, but I am completely out of # tuits for now, # >and have been since about the time I announced this thing's # existence. # >:-( # # Fair enough--that's what I wanted to know. # # >I do wonder what you'd replace the rx opcodes with. I don't see any # >use for some of the existing opcodes (regex flag setting, zwa_atend, # >etc.), but doing things like maintaining the backtracking stack using # >generic opcodes sounds very slow to me. # # Maybe. I don't think so, though, and if some specialized constructs # are needed, I think they need to be brought outside the context of # the regex engine. # # >The last benchmarking I did on the regex engine (with a single regex, # >admittedly) put it at somewhat better than half the speed of perl5's # >engine, which isn't too bad. Do you have newer (worse) numbers? # # Half the speed is pretty abominable. However, code using the regex # opcodes to match /fo+ba?r/ runs at the same speed or slower than code # using plain positional ords, and the positional ord code sees # significant gains when using the JIT. Neither version came within # half the speed of perl, though. Both were slower than that.
On the other hand, we never did test it with stack operations, probably because most realistic test cases that need them use character classes. Also, I'm fairly certain that by using a vtable, I can put in some evil speed hacks. (For example, "OK, this string is Unicode. We'll use the Unicode vtable, which'll transcode to utf32 and reach inside the string's guts for speed.") # >My compiler would be relatively easy to retarget to general # opcodes or # >another set of regex opcodes, but I am very skeptical that # regexen can # >be sufficiently fast without some tailored opcodes and a regex state # >PMC. # # Possibly. Regardless, at the moment I'm thinking that the current # shot at regexes is, while good, insufficiently compelling over the # plain opcodes we have now. There are probably some significant idiotic moves in terms of speed in that code. If anybody can give me C profiling data I'd much appreciate it--free tools for that don't appear to exist on Windows. --Brent Dax <[EMAIL PROTECTED]> @roles=map {"Parrot $_"} qw(embedding regexen Configure) #define private public --Spotted in a C++ program just before a #include