Attached is a perl script, expand_regex.pl, which will accept an SA rules file on standard input and will by default output the expansions of those rules, taking into account regex factoring due to parentheses. When invoked with the -verbose option, the program will preface the expansion by the rule. It has several options which will cause it to expand various commonly used idioms inside regex patterns such as \d, [set of chars], {repetition count}, and ?. Note that these expansion options are off by default, and when enabled can create some very large expansion sequences.
Here's an example: % cat test.cf uri BigEvilList_us /\bc(?:arlz|hooz|ontrolz|raigz)|d(?:ia55|ia9|marketing|omez|ubnh|uckz)|e(?:( ?:asy\-|z)herbal|netmall|zoffer)|free(?:member|raffle)|g(?:hkp|hor|oodserver |rantz|trrrez)|herbal(?:\d\d?\d?|\d{1,4}|plus|rx)|i(?:mageshere|nfo(?:matrix z|rmatix))|j(?:5remf150|onnyz)|kpth|lnk\.revclx|natural(?:growth|herbal)|nom ore|(?:o(?:acklaz|nline-herbal)|p(?:luckz|ro(?:fitopportunity|pal))|sphot|sp liter|tinyz|tooshortz|unone|webleader|hardtyz)\.us\b/i % expand_regex.pl -v < test.cf uri BigEvilList_us /\bc(?:arlz|hooz|ontrolz|raigz)|d(?:ia55|ia9|marketing|omez|ubnh|uckz)|e(?:( ?:asy\-|z)herbal|netmall|zoffer)|free(?:member|raffle)|g(?:hkp|hor|oodserver |rantz|trrrez)|herbal(?:\d\d?\d?|\d{1,4}|plus|rx)|i(?:mageshere|nfo(?:matrix z|rmatix))|j(?:5remf150|onnyz)|kpth|lnk\.revclx|natural(?:growth|herbal)|nom ore|(?:o(?:acklaz|nline-herbal)|p(?:luckz|ro(?:fitopportunity|pal))|sphot|sp liter|tinyz|tooshortz|unone|webleader|hardtyz)\.us\b/i ---- expansion ---- carlz chooz controlz craigz dia55 dia9 dmarketing domez dubnh duckz easy-herbal ezherbal enetmall ezoffer freemember freeraffle ghkp ghor goodserver grantz gtrrrez herbal\d\d?\d? herbal\d{1,4} herbalplus herbalrx imageshere infomatrixz informatix j5remf150 jonnyz kpth lnk.revclx naturalgrowth naturalherbal nomore oacklaz.us online-herbal.us pluckz.us profitopportunity.us propal.us sphot.us spliter.us tinyz.us tooshortz.us unone.us webleader.us hardtyz.us --------------------- when invoked as 'expand_regex.pl -v -expand='d{?' < test.cf, 1.38 million lines are generated, illustrating the combinatorial expansion that can occur. With just '-expand=d' only 1000 lines are generated. This program is in a preliminary state, and certainly won't handle the wide variety of things that can appear in regex's, but it can make it easier to see what a complicated rule is doing. If you have comments, suggestions, or patches, please send them my way.
expand_regex.pl
Description: Binary data