One of the first things that's becoming obvious to me in playing with Rakudo's rules is that parsing strings isn't always what I'm going to want to do. The most common example of wanting to parse data that's not in string form is the YACC scenario where you want to have a function produce a stream of tokenized data that is then parsed into a more complex representation. In similar fashion there's transformers like TGE that take syntax trees and transform them into alternative representations.
To that end, I'd like to suggest (for 6.1 or whatever comes after initial stability) an extension to rules: [ 'orange', 'apple', 'apple', 'pear', 'banana' ] ~~ rx :data { 'apple'+ 'pear' } Adding :data forces the match to proceed against the elements of the supplied array as a sequence, rather than as individual matches the way it behaves now. Each element of the array is matched by each atom in the expression. To support complex data (instead of matching all elements as fixed strings), a new matching syntax is proposed. The "current object" in what follows is the object in the input data which is currently being treated as an atom (e.g. an array element). It might be any kind of data such as a sub-array, number or string. <^...> matches against embedded, complex data. There are several forms depending on what comes after the ^: Forms that work on the current element of the input: ^{...} smart-matches current object against return value of closure ^~exp parses exp as a regex and matches as a string against the current object (disabling :data locally) ^::exp exp is an identifier and smart-matches on type Note that the second two forms can be implemented (though possibly not optimally) using the first. These forms treat the current element of the input as a sub-array and attempt to traverse it, leaving :data enabled: ^[exp] parses exp as a regex and matches against an array object ^ name (note space) identical to <^[<name>]> Example: This parses a binary operator tree: token undef { <^{undef}> } token op { < + - * / > } # works because the whole object is a one-character string token term { <^::Num> | <^~ \d+ > | <undef> } # number, string with digits or undef rule binoptree { <op> $<left> = [ <term> | <^ binoptree> ] $<right> = [ <term> | <^ binoptree> ] } [ '+', 5, [ '*', 6, 7 ] ] ~~ rx :data /<binoptree>/ Some notes: perhaps this should simply refer to iterable objects and not arrays? Is there a better way to unify the handling of matching against the current object vs matching against embedded structures? What about matching nested hashes? What I find phenomenal is that this requires so little change to the existing spec for rules. It's a really simple approach, but give us the ability to start applying rules in all sorts of ways we never dreamed of before. I might even tackle trying to implement this instead of the parser library I was working on if there's some agreement that it makes sense and looks like the correct way to go about it....