On Tue, 30 Jun 2009, John Wilcock wrote:

Le 30/06/2009 17:16, John Hardin a écrit :
>     ... looking at the www peter got an impression of ...
>     (-> www.peter.got?)

 TLDs are limited and prevent FPs of that particular nature.

Sure, but there are lots of ccTLDs that could be confused with English words, never mind other languages.

Do you really want SpamAssassin to do URIBL lookups for invented.by (Belarus) for a sentence like "The www, invented by Tim Berners-Lee, ...", or billy.jo (Jordan) for "On the www, Billy-Jo can be heard..."? The processing overhead would be enormous.

I agree that a very general URI deobfuscation rule will be both expensive and FP-prone. I was commenting on the particular case of www.something.somethingelse, that while FPs can occur, the possible values for somethingelse make it less likely than that example suggested - but looking for obfuscated URIs having two-letter TLDs make FPs a lot more likely.

I think the existing rule is good; perhaps extending the \w repetition a bit so that it would match longer obfuscated domains like "eshopping123.com" or "yourdrugstore999.net"

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  USMC Rules of Gunfighting #9: Accuracy is relative: most combat
  shooting standards will be more dependent on "pucker factor" than
  the inherent accuracy of the gun.
-----------------------------------------------------------------------
 4 days until the 233rd anniversary of the Declaration of Independence

Reply via email to