On Wed, 28 Aug 2002, Larry Wall wrote: > : (1) Can we have a ":study" modifier in Perl 6 for patterns? > : > : It could be a no-op if necessary, but it could take the place of Perl 5's > : "study" operator and indicate that the programmer WANTS the pattern > : optimized for maximum runtime speed, even at the cost of compile time or > : memory. (Hey, how about a ":cram" modifier for extreme optimization? :-) > > Well, "studied" isn't really a property of a pattern--it's a property of a > string that knows it will have multiple patterns matched against it. One > could put a :study on the first pattern, but that's somewhat deceptive.
Oh yeah. I forgot it applied to the string, not the pattern! I forgot since I never use it! :-) Still, it could be considered a parallel... Perhaps a better approach would be to allow the optimization priorities to be specified, perhaps even as numerical ranges for relative importance? The three obvious dimensions to quantify would be compile time, run-time speed, and memory usage. There's often tradeoffs between these three, and allowing the ability for a programmer to specify his/her preferences could allow for aggressive optimizations that are normally inappropriate... Of course, these would be useful not only as modifiers for compiling any regexes, but as general pragmas controlling optimizing behavior of the entire Perl 6 compiler/optimizer... I'm not sure if it's good enough to just say "optimize for run-time speed at the expense of compile time and memory" (or variations for only having one of the two sacrificed) -- or it it's better to have a scale (say, 0-9) for how important each dimension is. For the extreme case where long compile time and large memory usage is irrelevant, but extreme run-time speed is a must, the programmer might specify optimization priorities of compile=0, memory=0, speed=9. I'm not sure what sort of syntax would be appropriate for such specifications... > : (2) Would simple alternation impede DFA-style optimizations? > : > : Currently, a pattern like (Jun|June) would never match "June" because the > : "leftmost" match "Jun" would always take precedence, despite the normal > : "longest-match" behavior of regexes in general. This example could be > : implemented in a DFA; would that always be the case? > > Well, "June" can match if what follows fails to match after "Jun". True enough. Couldn't that still be implemented in a DFA? (Possibly at the cost of doubling the size of the DFA for the later part of the regex!) > : Would it be better for the matching of (Jun|June) to be "undefined" and > : implementation-dependent? Or is it best to require "leftmost" semantics? > > Well, the semantics shouldn't generally wobble around like that, but it'd > be pretty easy to let them wobble on purpose via pragma (or via :modifier, > which are really just pragmas that scope to regex groups). Yeah, it's probably safer not to have that much room for undefined behavior since people will just try it and assume that their implementation is the universal behavior... Would there be a good way to say "don't care" about the longest-vs-leftmost matching semantics? Would it be worthwhile to have longest-trumps-leftmost as an optional modifier? (This might be very expensive if implemented in a backtracking engine, since it could no longer shortcut alternations...) Dan suggested ":dfa" for DFA semantics -- is that the best answer, or would it be better to define the modifiers in terms of visible behavior rather than implementation, if possible? Deven