New submission from Glenn Linderman <v+pyt...@g.nevcal.com>:

A URL potentially consists of four parts: path, PATH_INFO, anchor, 
QUERY_STRING.  The syntax is roughly:

/path/parts/cgi-script/path/info/parts#anchor?query-string

where # and ? characters play key roles.

is_cgi not-so-cleverly passes the whole request to _url_collapse_path for 
normalization, resulting in inappropriately normalizing the anchor and 
query-string as well as appropriately normalizing the path and path info parts. 
 Consider that there is no syntax restrictions preventing a query string or 
anchor from containing / and characters. I contrived such strings, and observed 
that they were passed to _url_collapse_path and were inappropriately 
"normalized".

Now for non-CGI usage, both the anchor and query-string are ignored by 
server.py (see translate_path which has code to chop them off). For non-CGI 
usage, translate_path is called just once, so this isn't particularly 
inefficient nor is it incorrect. However, for CGI usage it can be both.

It is not clear to me why browsers even send the anchor to the server, but they 
do. Perhaps there are servers that process it, but this one doesn't, in any 
case I can find, and I'm unaware of semantics applied by any server.  
Therefore, parsing and ignoring it seems appropriate.

Presently, inappropriate normalization of the query-string causes no 
correctness problem for CGI requests because of issue 14566 which reverts back 
to the original path.  Should issue 14566 be fixed as recommended, certain 
query-strings will be mangled by the inappropriate normalization. However, even 
though there is not presently a correctness problem, there is an efficiency 
problem for CGI requests: the anchor and query-string are both needlessly 
processed by _url_collapse_path, and potentially repeatedly processed by 
translate_path in the loop at the top of run_cgi. In fact, it is not exactly 
clear whether anchors and query-strings containing cleverly crafted / and . and 
.. combinations might be able to confuse that loop... because translate_path 
chops them off, but the loop does not!

It seems to me that the appropriate place to separate the anchor and query 
string would be in parse_request.  Instead of creating self.command, self.path, 
self.request_version there, I think it would be better to divide the current 
content of self.path into three parts: self.path, self.anchor, and 
self.query_string.

self.path, then, would unambiguously contain only path parts, and would be 
appropriately passed through _url_collapse_path... perhaps even right there in 
parse_request ... it seems appropriate to normalize the path for reqular files 
as well as CGI files for all the same reasons.

Reasons:
1) no ability to access files outside the configured realm due to malicious use 
of .. as a path component -- although it seems translate_path prevents this 
anyway, in a complex manner.
2) proper application of unquote, to allow # and ? characters to exist in path 
parts, and may also legally be applied to other characters by browser code

Neither of these actions are presently performed for regular files, only for 
CGI files.

If the processing happens in parse_request, then other code would be affected:

is_cgi would no longer need to call _url_collapse_path, as it would already be 
done

translate_path would no longer need to parse off query strings or anchors, as 
it would already be done

translate_path would no longer need to call urllib.parse.unquote, as it would 
already be done.

run_cgi would no longer need to parse anchor or query, but instead could 
reference query as self.query_string

----------
components: Library (Lib)
messages: 158166
nosy: orsenthil, v+python
priority: normal
severity: normal
status: open
title: http/server.py query string handling incorrect, inefficient
type: behavior
versions: Python 2.6, Python 2.7, Python 3.1, Python 3.2, Python 3.3

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue14567>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to