In the manual, http_build_query's documented behavior is to generate a
"URL-encoded query string."  The documentation does not specify what
is used as a separator between the array elements, but testing and/or
source perusal easily show that it is arg_separator.output.  This
makes sense when generating links to your own server, although it
would be nice to be able to specify the separator to use via an
optional third argument (one could emulate this with ini_set but a
third argument would be preferable).

The only other use that I am aware of for arg_separator.output at this
time is for the session URL rewriting engine.  However, these two uses
have quite different semantics: whereas the separators the rewriter
uses must be HTML-escaped in the output (in order to emit valid code),
those used by http_build_query must not be.

That is, the URI http://example.com/foo?bar=1&baz=2 is transmitted
without any further escaping in an HTTP header.  When represented in
an HTML document, however, it must (not considering the CDATA contents
of <script> and <style>) be escaped and may look something like <a
href="http://example.com/foo?bar=1&amp;baz=2";>.

The default setting for arg_separator.output is "&" but anyone who
wants to use URL rewriting and emit valid HTML code is currently
forced to use "&amp;".  While a bit inelegant (it would be preferable
for PHP to generate valid code by default), this workaround didn't
cause any real problems until PHP 5 and the introduction of
http_build_query.  With "&amp;" as arg_separator.output,
http_build_query will create query strings that are not suitable for
inclusion in an HTTP header.

The root of the problem is that session URL rewriting interprets
arg_separator.output as already HTML-escaped while http_build_query
interprets it as not.  Notice that arg_separator.input is clearly not
HTML-escaped so the most consistent view would be to have
arg_separator.output also be unescaped.  That way, if a certain use of
the setting requires escaping, it can be applied.  This model is
preferable to that of magic_quotes_gpc whereby it's escaped for one
particular use and in order to use it for any other, one must unescape
it and then re-escape it as needed.

So there are two basic issues:

1. The combination of the default setting of arg_separator.output and
the way the session URL rewriter works causes PHP to emit invalid
code.

2. The way the session URL rewriter interprets arg_separator.output is
different from the way in which http_build_query interprets it.

To address these, I see three possible solutions:

1. Define the semantics of arg_separator.output to be an unescaped
string.  Have the session URL rewriter HTML-escape the value of
arg_separator.output and use that result when it adds the SID to
links.

This approach makes the default setting of PHP generate valid code and
standardizes the interpretation of arg_separator.output.  However,
there is a minor BC issue for those who have installed an older
version of PHP, modified arg_separator.output to be "&amp;" and then
upgrade with the same configuration file.  This really only applies to
those who have proactively modified their configuration in order to
achieve HTML validity and represent the group most easily able to
adapt to the change, however.

2. Define another configuration directive (yuck) for http_build_query to use.

This would have a BC issue for someone using ini_set to control the
separator used by arg_separator.output in order to generate (for
example) off-site links to a server that uses an input separator that
differs from the output separator currently in use.  Also, the default
configuration of session URL rewriting would still emit invalid code.

3. Define the semantics of arg_separator.output to be an escaped
string and revise http_build_query to unescape before applying.  The
default would need to be changed to "&amp;".

Possible BC issue on upgrade for anyone who had a default
arg_separator.output ("&") in their php.ini as they'd no longer have a
valid setting.  Also, this approach would generate cognitive
dissonance between arg_separator.input and arg_separator.output.

A variation on this solution doesn't revise http_build_query but
instead documents it as generating HTML-escaped strings.  However,
this would be a BC break for those using it to create Location headers
or the like, as well as those who conscientiously apply htmlentities
to its output before sticking it in an HTML document.

My preferred solution is 1 (revise URL rewriting engine) but it's
quite possible that I've neglected to consider some other way to fix
it or some argument for or against those I've presented.

Bug report reference: http://bugs.php.net/bug.php?id=30049
Additional information: http://www.sitepoint.com/blog-post-view.php?id=245295

 - Sharif

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to