On Tue, 30 Jun 2009, John Wilcock wrote:
Le 30/06/2009 17:16, John Hardin a écrit :
> ... looking at the www peter got an impression of ...
> (-> www.peter.got?)
TLDs are limited and prevent FPs of that particular nature.
Sure, but there are lots of ccTLDs that could be confused with English words,
never mind other languages.
Do you really want SpamAssassin to do URIBL lookups for invented.by
(Belarus) for a sentence like "The www, invented by Tim Berners-Lee,
...", or billy.jo (Jordan) for "On the www, Billy-Jo can be heard..."?
The processing overhead would be enormous.
I agree that a very general URI deobfuscation rule will be both expensive
and FP-prone. I was commenting on the particular case of
www.something.somethingelse, that while FPs can occur, the possible values
for somethingelse make it less likely than that example suggested - but
looking for obfuscated URIs having two-letter TLDs make FPs a lot more
likely.
I think the existing rule is good; perhaps extending the \w repetition a
bit so that it would match longer obfuscated domains like
"eshopping123.com" or "yourdrugstore999.net"
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
USMC Rules of Gunfighting #9: Accuracy is relative: most combat
shooting standards will be more dependent on "pucker factor" than
the inherent accuracy of the gun.
-----------------------------------------------------------------------
4 days until the 233rd anniversary of the Declaration of Independence