-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Sunday, March 31 at 11:16 PM, quoth Luis Mochan:
>> I'm a perl guy, yet that's non-trivial here.  Thx.  :-)
>>
> You're welcome. I don't know if there are other characters that appear 
> in an url and need to be escaped for the shell ([;><]?); they could 
> easily be accomodated by modifying 'wlmsanitize'. The page for the 
> extract_url project (http://www.memoryhole.net/~kyle/extract_url/) 
> mentions that the program already transforms characters dangerous to 
> the shell, but then it only mentions explicitly single quotes and 
> dollar signs.

Hello,

I'm the author of extract_url.pl, so perhaps I can shed some light 
here.

The *correct* place to "fix" the issue of escaping (or otherwise 
sanitizing) ampersands is in the sanitizeuri function (line 208). The 
current version of extract_url.pl uses this:

     sub sanitizeuri {
         my($uri) = @_;
         $uri =~ 
s/([^a-zA-Z0-9_.!*()\@&:=\?\/%~+-])/sprintf("%%%X",ord($1))/egs;
         return $uri;
     }

Essentially, what that does is explicitly whitelists the characters 
a-z, A-Z, 0-9, _, ., !, *, (, ), @, &, :, =, ?, /, %, ~, +, and - and 
turns *anything* else into the percent-encoded equivalent (e.g. %26), 
which should be correctly decoded by any standards-compliant 
URL-decoder (see RFC 3986). If you want to eliminate ampersands from 
the characters allowed in a URL, simply remove the ampersand from that 
list. It's as simple as that. I think Luis's patch is a little overly 
complicated, and I think the policy of using backslashes to escape 
such characters (instead of percent-encoding) is dangerous, given that 
it's more likely to be stripped off by intervening scripts. I don't 
want future bug reports that say "my setup strips backslashes, so can 
you create an option that will triple-backslash the $ character?". :) 
(Followed, the next week, by a request for quadruple-backslashing, of 
course!)

I've personally never had a problem with ampersands, and I'm not sure 
why some people do. Extract_url.pl constructs system commands like so:

   /path/to/handler 'http://url.with/an&ampersand'

... which should be perfectly safe and work just fine (and does for 
me). I suspect the problem stems from using other wrapper script (e.g. 
/etc/urlhandler/urlhandler.sh). I bet the that wrapper script is not 
properly quoting its first argument.

In any event, percent-encoding, by modifying that one line, is 
probably the right way to go.

~Kyle
- -- 
The purpose of computing is insight, not numbers.
                                                  -- Richard W. Hamming
-----BEGIN PGP SIGNATURE-----
Comment: Thank you for using encryption!

iQIcBAEBCAAGBQJRWfZoAAoJECuveozR/AWeH4oQAKRu3Jg1n7KVXT0q0DogCoE+
Ms/gH8EKUwN8KtWhg3wNDgCIh0GXaNykywQPshbM59qP6U8uFofavngGfQv1YCEV
vM94vsNLY8AOfdv/6tRkQFKDi5RadKRfjcJYqHzr11LSJ2e+Ns+i4gx+0jkSCe9/
2FIWjZjsmH5WUHNktAzC0dCGxqBb6vO4Oc7JRuLpaof6jLWLMvJBgM9HVCf67RrX
aEALusVBqSZKBlr+UBk1lF0obEbijGX+hJuHg8udaOVgCsljpzDcOku5my2V13Pu
LZ1ltKv4/y+Z2tofyjDpXNnsomENYfWb6LGfQgystY8xvSv94TJLOlM7oaSsJmJq
hPdP0T5rJ3lryaadc3I5p7GUI5zqUk0T6e8FM8vM1ZUXS8NyN0ZN7NeSSX/5mAMS
OCCkxxXSaLnbr2HUetjYknnVB4W6WKR2eEjgP+VHMtemRb9W6UVgjO1nnoqm4WOM
zRPDIk6VvJgTPUuIso5oq2JoYC0wowmXJBz31UL6y98p1zcPcZVPFDxtf/9p6pUV
/VTDD4bPZCSaQiwhr2abUd4OxOd5bpYx994Z7L5oCQezGDXhEt6XgeEdGBdT21bt
z8FKnqGNOp0EO9C2kX9fPGbRITXK32urUEqeuuB0AHDp3D7VyZ3KRiXIeFFRWvMj
kQzyzKnbnm1uHloyk89l
=n1YG
-----END PGP SIGNATURE-----

Reply via email to