Hi Kyle, > I'm the author of extract_url.pl, so perhaps I can shed some light > here. Thanks. > The *correct* place to "fix" the issue of escaping (or otherwise > sanitizing) ampersands is in the sanitizeuri function (line 208). The > current version of extract_url.pl uses this: > > sub sanitizeuri { > my($uri) = @_; > $uri =~ > s/([^a-zA-Z0-9_.!*()\@&:=\?\/%~+-])/sprintf("%%%X",ord($1))/egs; > return $uri; > }
I tried now your fix, and it didn't work for me; my browser doesn't find the resulting pages when the url has ampersands that are converted to %26 (probably because the % itself is further encoded as %25 before been sent to the server by the browser (?)) > ... > I've personally never had a problem with ampersands, and I'm not sure > why some people do. Extract_url.pl constructs system commands like so: > > /path/to/handler 'http://url.with/an&ersand' I changed my handler to '/bin/echo %s >>tmp.txt' and it wrote the correct result, so I guess you're right here. > ... which should be perfectly safe and work just fine (and does for > me). I suspect the problem stems from using other wrapper script (e.g. > /etc/urlhandler/urlhandler.sh). I bet the that wrapper script is not > properly quoting its first argument. I don't know much about shell programming, but I found that /etc/urlhandler/url_handler.sh is a shell script that obtains its url doing '$url=$1'. I replaced the whole handler by the following program: #! /bin/bash url=$1; shift echo $url >>tmp.txt; and found out that the url is cut short at the first ampersand. I don't understand why echo by itself yields the correct result (above) while echo through a bash script yields the truncated result. Thanks and best regards, Luis -- o W. Luis Mochán, | tel:(52)(777)329-1734 /<(*) Instituto de Ciencias Físicas, UNAM | fax:(52)(777)317-5388 `>/ /\ Apdo. Postal 48-3, 62251 | (*)/\/ \ Cuernavaca, Morelos, México | moc...@fis.unam.mx /\_/\__/ GPG: DD344B85, 2ADC B65A 5499 C2D3 4A3B 93F3 AE20 0F5E DD34 4B85
signature.asc
Description: Digital signature