Attached is a perl script, expand_regex.pl, which will accept an SA rules
file
on standard input and will by default output the expansions of those rules,
taking
into account regex factoring due to parentheses. When invoked with
the -verbose option, the program will preface the expansion by the rule. It
has several
options which will cause it to expand various commonly used idioms inside
regex
patterns such as \d, [set of chars], {repetition count}, and ?. Note that
these
expansion options are off by default, and when enabled can create some very
large
expansion sequences.

Here's an example:

% cat test.cf
uri BigEvilList_us
/\bc(?:arlz|hooz|ontrolz|raigz)|d(?:ia55|ia9|marketing|omez|ubnh|uckz)|e(?:(
?:asy\-|z)herbal|netmall|zoffer)|free(?:member|raffle)|g(?:hkp|hor|oodserver
|rantz|trrrez)|herbal(?:\d\d?\d?|\d{1,4}|plus|rx)|i(?:mageshere|nfo(?:matrix
z|rmatix))|j(?:5remf150|onnyz)|kpth|lnk\.revclx|natural(?:growth|herbal)|nom
ore|(?:o(?:acklaz|nline-herbal)|p(?:luckz|ro(?:fitopportunity|pal))|sphot|sp
liter|tinyz|tooshortz|unone|webleader|hardtyz)\.us\b/i

% expand_regex.pl -v < test.cf


uri BigEvilList_us
/\bc(?:arlz|hooz|ontrolz|raigz)|d(?:ia55|ia9|marketing|omez|ubnh|uckz)|e(?:(
?:asy\-|z)herbal|netmall|zoffer)|free(?:member|raffle)|g(?:hkp|hor|oodserver
|rantz|trrrez)|herbal(?:\d\d?\d?|\d{1,4}|plus|rx)|i(?:mageshere|nfo(?:matrix
z|rmatix))|j(?:5remf150|onnyz)|kpth|lnk\.revclx|natural(?:growth|herbal)|nom
ore|(?:o(?:acklaz|nline-herbal)|p(?:luckz|ro(?:fitopportunity|pal))|sphot|sp
liter|tinyz|tooshortz|unone|webleader|hardtyz)\.us\b/i
---- expansion ----
carlz
chooz
controlz
craigz
dia55
dia9
dmarketing
domez
dubnh
duckz
easy-herbal
ezherbal
enetmall
ezoffer
freemember
freeraffle
ghkp
ghor
goodserver
grantz
gtrrrez
herbal\d\d?\d?
herbal\d{1,4}
herbalplus
herbalrx
imageshere
infomatrixz
informatix
j5remf150
jonnyz
kpth
lnk.revclx
naturalgrowth
naturalherbal
nomore
oacklaz.us
online-herbal.us
pluckz.us
profitopportunity.us
propal.us
sphot.us
spliter.us
tinyz.us
tooshortz.us
unone.us
webleader.us
hardtyz.us
---------------------

when invoked as 'expand_regex.pl -v -expand='d{?' < test.cf, 1.38 million
lines are generated, illustrating the combinatorial expansion that can
occur.
With  just '-expand=d' only 1000 lines are generated.

This program is in a preliminary state, and certainly won't handle the
wide variety of things that can appear in regex's, but it can make it easier
to see what a complicated rule is doing.

If you have comments, suggestions, or patches, please send them my way.

Attachment: expand_regex.pl
Description: Binary data

Reply via email to