On Fri, Jul 23, 2010 at 06:57:33PM -0500, Jared Johnson wrote: > > It seems like > > all you have to do to get around the etc. problem is to wait a > > little longer before applying the fixup -- allow the semicolon to match in > > the hostname search and then strip it out. > > My bad.. I guess the plugin currently only fixes up '&#\d\d\d' encoding, > not etc. maybe i'll work on that...
Yeah, I didn't include full HTML entity decoding support. Decoding any entities is a heuristic, since the plugin isn't MIME aware and really doesn't know that it's looking at HTML. Encoded entities are fairly distinctive, though, and the characters used in the format are generally invalid within the hostname component of URLs, so it's a fair guess. As IDNs become more common we can also expect to see munging attacks based on UTF-* decoder variants, too, come to think of it. :P -- Devin \ aqua(at)devin.com, IRC:Requiem; http://www.devin.com Carraway \ 1024D/E9ABFCD2: 13E7 199E DD1E 65F0 8905 2E43 5395 CA0D E9AB FCD2