To tell the truth, I spent half the day just understanding what you were
talking about and I had to do a number of tests before I actually did... And
now I believe that the issue you raised is a real problem. Meanwhile I will
suggest an alternate solution at the end of this post.
My conclusions are:
1. mod_rewrite actually un-escapes the URL path
I think it should not, but there may be a good reason for it.
2. the built-in RewriteMap function "escape" does not escape reserved
characters like & and $
I think there is a need for a built-in function that actually does encode these
characters. Maybe that function should be the currently existing escape
function, or maybe a new function needs to be added.
Given the following configuration:
RewriteLogLevel 9
RewriteMap encode int:escape
RewriteRule ^/folder/([^/]*)/([^/]*)
/cgi-bin/printenv?vara=${encode:$1}&varb=$2 [PT,QSA]
I would expect the query "GET /folder/apples&oranges/more?varc=rock%26roll" to
be handled in the following way:
1: (2) init rewrite engine with requested uri /folder/apples&oranges/more
2: (3) applying pattern '^/folder/([^/]*)/([^/]*)' to uri
'/folder/apples&oranges/more'
3: (5) map lookup OK: map=encode key=apples&oranges -> val=apples%26oranges
4: (2) rewrite /folder/apples&oranges/more ->
/cgi-bin/printenv?vara=apples&oranges&varb=more
5: (3) split uri=/cgi-bin/printenv?vara=apples&oranges&varb=more ->
uri=/cgi-bin/printenv, args=vara=apples%26oranges&varb=more&varc=rock%26roll
6: (2) forcing '/cgi-bin/printenv' to get passed through to next API
URI-to-filename handler
Instead of the expected line 3, what actually happens is:
(5) map lookup OK: map=encode key=apples&oranges -> val=apples&oranges
which tells me that the escape function, for whatever reason, does not escape
reserved characters except ";" (I did not test "/" and "?").
Furthermore I would expect "GET /folder/apples%26oranges/more" to be handled in
the following way:
1: (2) init rewrite engine with requested uri /folder/apples%26oranges/more
2: (3) applying pattern '^/folder/([^/]*)/([^/]*)' to uri
'/folder/apples%26oranges/more'
3: (5) map lookup OK: map=encode key=apples%26oranges -> val=apples%2526oranges
4: (2) rewrite /folder/apples&oranges/more ->
/cgi-bin/printenv?vara=apples%2526oranges&varb=more
5: (3) split uri=/cgi-bin/printenv?vara=apples%2526oranges&varb=more ->
uri=/cgi-bin/printenv, args=vara=apples%2526oranges&varb=more&varc=rock%26roll
6: (2) forcing '/cgi-bin/printenv' to get passed through to next API
URI-to-filename handler
Instead of the expected line 1, I get
(2) init rewrite engine with requested uri /folder/apples&oranges/more
i.e. mod_rewrite un-escapes the URL path, which is carried over to the
remainder of the processing.
I _know_ it is mod_rewrite that un-escapes the URL path and only the URL path,
because when I request the URL "GET
/cgi-bin/printenv?vara=apples%26oranges&varb=more" which is not processed by
mod_rewrite, it comes out as expected at the other end, and a query string in a
request processed by mod_rewrite containing an escaped character makes it
through unchanged.
The problem can be circumvented by implementing your own escape function along
the lines of:
-------
encode.pl:
#!/usr/bin/perl
select STDOUT ; $|=1;
while ( <> )
{
$_ =~ s/%/%25/;
$_ =~ s/&/%26/;
/* Add other translation rules here */
print $s;
}
$|=0
--------
RewriteEngine On
RewriteLog /u01/apachetest/logs/rewrite_log
RewriteLogLevel 9
RewriteMap encode prg:/u01/apachetest/conf/encode.pl
RewriteRule ^/folder/([^/]*)/([^/]*)
/cgi-bin/printenv?vara=${encode:$1}&varb=${encode:$1} [PT,QSA]
Then the request GET /folder/apples&%25oranges/more?varc=rock%26roll will end
up as
(2) rewrite /folder/apples&%oranges/more ->
/cgi-bin/printenv?vara=apples%26%25oranges&varb=more
(3) split uri=/cgi-bin/printenv?vara=apples%26%25oranges&varb=more ->
uri=/cgi-bin/printenv, args=vara=apples%26%25oranges&varb=more&varc=rock%26roll
Hope this works.
If nobody else replies with an explanation as to why mod_rewrite behaves the
way it does, maybe you should file a bug report...
-ascs
>From RFC 2396:
2.2. Reserved Characters
Many URI include components consisting of or delimited by, certain
special characters. These characters are called "reserved", since
their usage within the URI component is limited to their reserved
purpose. If the data for a URI component would conflict with the
reserved purpose, then the conflicting data must be escaped before
forming the URI.
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
"$" | ","
The "reserved" syntax class above refers to those characters that are
allowed within a URI, but which may not be allowed within a
particular component of the generic URI syntax; they are used as
delimiters of the components described in Section 3.
2.4.2. When to Escape and Unescape
Normally, the only time
escape encodings can safely be made is when the URI is being created
from its component parts; each component may have its own set of
characters that are reserved, so only the mechanism responsible for
generating or interpreting that component can determine whether or
not escaping a character will change its semantics. Likewise, a URI
must be separated into its components before the escaped characters
within those components can be safely decoded.
---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: [EMAIL PROTECTED]
" from the digest: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]