Hello everyone, This is my first post to the actual mailing list and not to Google Groups (yeah, took me a bit to figure out they're not the same). I have a few questions about the rules in Perl 6, and hopefully I'm not repeating stuff that's already been brought up before. (I searched through the archive a bit, but didn't see anything.)
==Question 1== macro rxmodinternal:<x> { ... } # define your own /:x() stuff/ macro rxmodexternal:<x> { ... } # define your own m:x()/stuff/ With this, I can make my own adverbs then? Like :without, or :skip, and describe what each does? If so, then maybe the rest of my major questions have a very simple answer: make it yourself. If that is the case, I'll try to figure out how to do it with pugs, if possible. ==Question 2== I finished reading E05 and A05, and I really like the idea of the :w modifier being able to essentially skip over certain parts of the text. Right now A05 states: > <?ws> can't decide what to do until it sees the data. It still does the > right thing. If not, define your own <?ws> and :w will use that. So is :w invoking a rule that just skips whatever it matches? What I'm wondering about is how I can create a mechanism that acts like :w, but can be combined for nested rules. For instance, say I'm trying to pull out date from (html) text: ... Jan had a great birthday on <B>F e b</B> 5, 2<B>00</B>3. Her older sister, May, turned 23 on <B>Ma r</B> 5, 19<b>98</b> Their younger sister, June, will be going home on <B >Apr</B> 5, 2<B>006</B> April is their mother, and she's buying a car on <B>Feb< / B > 7, 2<B>0</B>06 I don't know when Roger, their father, is going to buy his guitar. ... The grammar becomes messy when I have to account for things that the rules don't allow me to just easily skip: grammar Date { rule tag_B_beg:w:i { \<B\> } rule tag_B_end:w:i { \<\/B\> } rule tag_B:w:i { <tag_B_beg>|<tag_B_end> } rule month_english:w:i { J<sp>*a<sp>*n | F<sp>*e<sp>*b | M<sp>*a<sp>*r | A<sp>*p<sp>*r | M<sp>*a<sp>*y | J<sp>*u<sp>*n<sp>*e | J<sp>*u<sp>*l<sp>*y | A<sp>*u<sp>*g | S<sp>*e<sp>*p | O<sp>*c<sp>*t | N<sp>*o<sp>*v | D<sp>*e<sp>*c } rule year:w:i { (\d<tag_B>?\d<tag_B>?\d<tag_B>?\d) } rule month:w:i { <after <tag_B_beg> > (<month_english>) <before <tag_B> > } rule day { <after <month> > ( <after <[1..2]> >? <[1..9]> | 3<[0..1]> ) <sp>+ <before <year> > } rule date { <month> <day> <year> } } I don't want to just skip <B> tags wholly, because they do serve a purpose, but only in a particular context. (Can <?ws> be changed back to a "default" if changed to include html tags?) I was thinking about maybe using a closure at the beginning of the rule (to change the string about to be processed) and then a closure at the end of the rule (to change it back to its pre-processed form) to make it work: grammar Date { rule tag_B_beg:w:i { \<B\> } rule tag_B_end:w:i { \<\/B\> } rule month_english:w:i { { $/ ~~ s/<sp>// } [ Jan | Feb | Mar | Apr | May | June | July | Aug | Sep | Oct | Nov | Dec ] { $/ ~~ $/.pretext } } rule year:w:i { { $/ ~~ s/<tag_B_beg>|<tag_B_end>// } (\d{4}) { $/ ~~ $/.pretext } } rule month:w:i { <after <tag_B_beg> > (<month_english>) <before <tag_B> > } rule day { <after <month> > ( <after <[1..2]> >? <[1..9]> | 3<[0..1]> ) <sp>+ <before <year> > } rule date { <month> <day> <year> } } That's okay to do right? It looks a lot cleaner to me, but I'm wondering if there's a better way to skip a rule match in another rule (another adverb like :skip, with :w being a built-in shorthand for :skip(<?ws>)). Or am I making this too complex when it really isn't? Any pointers on how to do stuff like this more simply? ==Question 3== I'm also curious about exclusions. Right now, to do a general exclusion, I'm thinking I would probably do something like: rule text_no_date { {$/ !~ /<date>/ } ^ [.*] $ } Would something like below be easier to decode for a human reader? text:without(<date>) { ^ [.*] $ } If that adverb were available, then I could have a rule that doesn't include two other rules: line:without(<date>&&<name>) { ^^ [.*] $$ } The rule above would match a line with a <date> or <name>, but not a line with both. Like I said before, I don't know if this is the best way to do stuff like this, or if I'm thinking about these problems the wrong way, so *any* help would be great. Thanks, David