-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On Sunday, March 31 at 11:16 PM, quoth Luis Mochan: >> I'm a perl guy, yet that's non-trivial here. Thx. :-) >> > You're welcome. I don't know if there are other characters that appear > in an url and need to be escaped for the shell ([;><]?); they could > easily be accomodated by modifying 'wlmsanitize'. The page for the > extract_url project (http://www.memoryhole.net/~kyle/extract_url/) > mentions that the program already transforms characters dangerous to > the shell, but then it only mentions explicitly single quotes and > dollar signs.
Hello, I'm the author of extract_url.pl, so perhaps I can shed some light here. The *correct* place to "fix" the issue of escaping (or otherwise sanitizing) ampersands is in the sanitizeuri function (line 208). The current version of extract_url.pl uses this: sub sanitizeuri { my($uri) = @_; $uri =~ s/([^a-zA-Z0-9_.!*()\@&:=\?\/%~+-])/sprintf("%%%X",ord($1))/egs; return $uri; } Essentially, what that does is explicitly whitelists the characters a-z, A-Z, 0-9, _, ., !, *, (, ), @, &, :, =, ?, /, %, ~, +, and - and turns *anything* else into the percent-encoded equivalent (e.g. %26), which should be correctly decoded by any standards-compliant URL-decoder (see RFC 3986). If you want to eliminate ampersands from the characters allowed in a URL, simply remove the ampersand from that list. It's as simple as that. I think Luis's patch is a little overly complicated, and I think the policy of using backslashes to escape such characters (instead of percent-encoding) is dangerous, given that it's more likely to be stripped off by intervening scripts. I don't want future bug reports that say "my setup strips backslashes, so can you create an option that will triple-backslash the $ character?". :) (Followed, the next week, by a request for quadruple-backslashing, of course!) I've personally never had a problem with ampersands, and I'm not sure why some people do. Extract_url.pl constructs system commands like so: /path/to/handler 'http://url.with/an&ersand' ... which should be perfectly safe and work just fine (and does for me). I suspect the problem stems from using other wrapper script (e.g. /etc/urlhandler/urlhandler.sh). I bet the that wrapper script is not properly quoting its first argument. In any event, percent-encoding, by modifying that one line, is probably the right way to go. ~Kyle - -- The purpose of computing is insight, not numbers. -- Richard W. Hamming -----BEGIN PGP SIGNATURE----- Comment: Thank you for using encryption! iQIcBAEBCAAGBQJRWfZoAAoJECuveozR/AWeH4oQAKRu3Jg1n7KVXT0q0DogCoE+ Ms/gH8EKUwN8KtWhg3wNDgCIh0GXaNykywQPshbM59qP6U8uFofavngGfQv1YCEV vM94vsNLY8AOfdv/6tRkQFKDi5RadKRfjcJYqHzr11LSJ2e+Ns+i4gx+0jkSCe9/ 2FIWjZjsmH5WUHNktAzC0dCGxqBb6vO4Oc7JRuLpaof6jLWLMvJBgM9HVCf67RrX aEALusVBqSZKBlr+UBk1lF0obEbijGX+hJuHg8udaOVgCsljpzDcOku5my2V13Pu LZ1ltKv4/y+Z2tofyjDpXNnsomENYfWb6LGfQgystY8xvSv94TJLOlM7oaSsJmJq hPdP0T5rJ3lryaadc3I5p7GUI5zqUk0T6e8FM8vM1ZUXS8NyN0ZN7NeSSX/5mAMS OCCkxxXSaLnbr2HUetjYknnVB4W6WKR2eEjgP+VHMtemRb9W6UVgjO1nnoqm4WOM zRPDIk6VvJgTPUuIso5oq2JoYC0wowmXJBz31UL6y98p1zcPcZVPFDxtf/9p6pUV /VTDD4bPZCSaQiwhr2abUd4OxOd5bpYx994Z7L5oCQezGDXhEt6XgeEdGBdT21bt z8FKnqGNOp0EO9C2kX9fPGbRITXK32urUEqeuuB0AHDp3D7VyZ3KRiXIeFFRWvMj kQzyzKnbnm1uHloyk89l =n1YG -----END PGP SIGNATURE-----