When we write regexes, we generally capture stuff in a way that makes
the following semantic analysis easier. For example we could have a
regex m/ <this>+ <that>? <this>*/ if we're only interested in the match
trees of what <this> and <that> matches, not their respective order.

But if you want to re-used the match tree for something different (say,
instead of doing a semantic analysis we want to do syntax hilighting)
it's rather hard to reconstruct the original text, and what part of it
was matched by which subrule. Currently you have to fiddle with $/.from
and $/.to, and sort all subrules by their respective $/.from and $/.to,
and then work out which part hasn't been matched by subrules.

This is rather  weird and error-prone difference, and I wonder if we
should provide some easier way to access all chunks of text in the order
that they were matched.

I guess this description isn't very clear, so I'll try with an example:

"abc 234 def 789 for 456" ~~ mm/ [ <ident> \d+ ]**0..2 'for' (\d+) /;
$/.chunks would be this list:

   $<ident>[0],
   ' ',
   '234',
   ' ',
   $<ident>[1],
   ' ',
   '789',
   ' ',
   'for',
   ' ',
   '456'

I don't know if the syntax and exact semantics are very good, but IMHO
we should have some way of reconstructing a match that is closer to the
original string than to the structure of the matching regex.

(I also don't know if that's feasible in terms of efficiency)

Any ideas?

Moritz

-- 
Moritz Lenz
http://perlgeek.de/ |  http://perl-6.de/ | http://sudokugarden.de/

Reply via email to