In the manual, http_build_query's documented behavior is to generate a "URL-encoded query string." The documentation does not specify what is used as a separator between the array elements, but testing and/or source perusal easily show that it is arg_separator.output. This makes sense when generating links to your own server, although it would be nice to be able to specify the separator to use via an optional third argument (one could emulate this with ini_set but a third argument would be preferable).
The only other use that I am aware of for arg_separator.output at this time is for the session URL rewriting engine. However, these two uses have quite different semantics: whereas the separators the rewriter uses must be HTML-escaped in the output (in order to emit valid code), those used by http_build_query must not be. That is, the URI http://example.com/foo?bar=1&baz=2 is transmitted without any further escaping in an HTTP header. When represented in an HTML document, however, it must (not considering the CDATA contents of <script> and <style>) be escaped and may look something like <a href="http://example.com/foo?bar=1&baz=2">. The default setting for arg_separator.output is "&" but anyone who wants to use URL rewriting and emit valid HTML code is currently forced to use "&". While a bit inelegant (it would be preferable for PHP to generate valid code by default), this workaround didn't cause any real problems until PHP 5 and the introduction of http_build_query. With "&" as arg_separator.output, http_build_query will create query strings that are not suitable for inclusion in an HTTP header. The root of the problem is that session URL rewriting interprets arg_separator.output as already HTML-escaped while http_build_query interprets it as not. Notice that arg_separator.input is clearly not HTML-escaped so the most consistent view would be to have arg_separator.output also be unescaped. That way, if a certain use of the setting requires escaping, it can be applied. This model is preferable to that of magic_quotes_gpc whereby it's escaped for one particular use and in order to use it for any other, one must unescape it and then re-escape it as needed. So there are two basic issues: 1. The combination of the default setting of arg_separator.output and the way the session URL rewriter works causes PHP to emit invalid code. 2. The way the session URL rewriter interprets arg_separator.output is different from the way in which http_build_query interprets it. To address these, I see three possible solutions: 1. Define the semantics of arg_separator.output to be an unescaped string. Have the session URL rewriter HTML-escape the value of arg_separator.output and use that result when it adds the SID to links. This approach makes the default setting of PHP generate valid code and standardizes the interpretation of arg_separator.output. However, there is a minor BC issue for those who have installed an older version of PHP, modified arg_separator.output to be "&" and then upgrade with the same configuration file. This really only applies to those who have proactively modified their configuration in order to achieve HTML validity and represent the group most easily able to adapt to the change, however. 2. Define another configuration directive (yuck) for http_build_query to use. This would have a BC issue for someone using ini_set to control the separator used by arg_separator.output in order to generate (for example) off-site links to a server that uses an input separator that differs from the output separator currently in use. Also, the default configuration of session URL rewriting would still emit invalid code. 3. Define the semantics of arg_separator.output to be an escaped string and revise http_build_query to unescape before applying. The default would need to be changed to "&". Possible BC issue on upgrade for anyone who had a default arg_separator.output ("&") in their php.ini as they'd no longer have a valid setting. Also, this approach would generate cognitive dissonance between arg_separator.input and arg_separator.output. A variation on this solution doesn't revise http_build_query but instead documents it as generating HTML-escaped strings. However, this would be a BC break for those using it to create Location headers or the like, as well as those who conscientiously apply htmlentities to its output before sticking it in an HTML document. My preferred solution is 1 (revise URL rewriting engine) but it's quite possible that I've neglected to consider some other way to fix it or some argument for or against those I've presented. Bug report reference: http://bugs.php.net/bug.php?id=30049 Additional information: http://www.sitepoint.com/blog-post-view.php?id=245295 - Sharif -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php