---------- Forwarded message ----------
From: Rich Bowen <[EMAIL PROTECTED]>
To: users@httpd.apache.org
Date: Tue, 22 Apr 2008 10:02:04 -0400
Subject: Re: [EMAIL PROTECTED] mod_rewrite: PATH_INFO gets injected with each
Rule



On Apr 21, 2008, at 08:54, Aleksander Budzynowski wrote:


Hi,

The behaviour I'm seeing resemebles the bug described here:
http://archive.apache.org/gnats/7879 Reportedly it was fixed in
2.0.30.However, testing under both 2.2.3 and 2.0.61 I get the same
sort of problem.

Essentially, PATH_INFO is appended to the end of the URI before each
RewriteRule is processed. If more than one RewriteRule match, you can end up
with redundant garbage at the end of the URI.

Let's consider a rule designed to turn all underscores into hyphens (done in
a per-directory context, i.e. .htaccess file):

RewriteEngine On
#Convert _ to - (N flag ensures that all underscores get converted)
RewriteRule ^(.*)_(.*) $1-$2 [N]

It seems innocent enough. But issue a request for

/_f_o_o_/bar

(where _f_o_o_ does not exist, placing '/bar' in PATH_INFO), and this gets
rewritten to /-f-o-o-/bar/bar/bar/bar!

If you request /foo/_bar (assuming foo does not exist), then each new _bar
will feed an extra underscore back into the mix, creating an infinite loop -
even worse.


In the RewriteLog, one sees something like this before the application of
each RewriteRule:

add path-info postfix: /rewritebase/_f_o_o_ -> /rewritebase/_f_o_o_/bar

although each time it accumulates an extra '/bar'.


This doesn't look right to me. Is it a bug? Or have I missed something
obvious?



This does look pretty nasty. Can you try 1) testing with the latest
versions, and 2) posting your RewriteLog so that we can see what process
it's going through to do this? Given that that's an example from the
documentation, one kind of hopes that it'll work correctly.




Also, I'm trying this out myself. Is it only on PATH_INFO, or is it also on
existing file names?


--Rich


It's only PATH_INFO, and only within .htaccess. Looking at the 2.2.8 source
(mod_rewrite.c:3694), this seems to be the culprit:

        if (r->path_info && *r->path_info) {
            rewritelog((r, 3, ctx->perdir, "add path info postfix: %s ->
%s%s",
                        ctx->uri, ctx->uri, r->path_info));
            ctx->uri = apr_pstrcat(r->pool, ctx->uri, r->path_info, NULL);
        }

It looks like nowhere in the rewriting process is r->path_info modified,
meaning that this happens for EVERY RewriteRule. And this becomes a problem
if more than one RewriteRule matches.

Back at line 3680, we have this:
    ctx->uri = r->filename;

Before any of the RewriteRules match, this will be the URI minus PATH_INFO.
But once a rule matches, the path is changed. PATH_INFO basically becomes
invalid!

Is PATH_INFO recalculated after a URI is run through mod_rewrite? (If so
then it would make perfect sense to empty r->path_info whenever a
RewriteRule matches.) If not, should it be? Maybe only in conjunction with
the [PT] flag?

If we can't, for whatever reason, disturb path_info, then we could add a
"matched" member to rewrite_ctx, to indicate that a substitution has already
been made, and not append PATH_INFO if this has occurred.

I have a feeling that this is a bug which went unnoticed because people
simply blamed it on the quirks of mod_rewrite.

-Aleks

Reply via email to