On Fri, Sep 24, 2010 at 4:21 PM, Bas Verhoeven <libev...@bserved.nl> wrote: > Hello, > > While using 'evhttp_encode_uri()' to encode GET & POST fields (I hope this > is the right function to use) I noticed that a lot of characters are not > being escaped in the way I would expect them to be escaped.
Hi, all! I'm trying to get a start on fixing http stuff and merging http fixes. First, though, I need to get up to speed on the evhttp stuff myself. This is a nice easy bug report, so I'll start out here. I'm going to say lots of confident-sounding stuff below, but please take what I'm saying with a grain of salt and correct me when I go wrong. The first thing to figure out is, "what is evhttp_encode_uri *supposed* to do?" Unfortunately, the documentation isn't too helpful. All it says is, /** Helper function to encode a URI. The returned string must be freed by the caller. @param uri an unencoded URI @return a newly allocated URI-encoded string */ Well, that makes no sense. "URI-encoding" is a pretty well-defined operation for individual parts of a URL, but it's not an operation you can apply to an entire "unencoded URI" [*]. Unfortunately, the documentation here implies that it takes a URI, which is nonsensical. So, the documentation is contradictory: does it encode URIs, or does it URI-encode strings? Well, its behavior is *almost* right for URI-encoding strings, and there is no such thing as "encoding a URI". I *THINK* it should say something more like this: /** Helper function to encode a string for use in a URI. All characters that are "unreserved" according to RFC3986 -- that is, ascii A-Z, a-z, 0-9, hyphen, dot, underscore, and tilde -- are left unchanged. All other characters are replaced by their hex-escaped equivalents. The returned string must be freed by the caller. @param str a string to encode @return a newly allocated URI-encoded string */ That's a much more sensible function for us to have. Will it break anything? Well, if you're building your URL by saying something dumb like asprintf(&url, "http://example.com/?q1=%s&q2=%s2", v1, v2); encoded = evhttp_encode_uri(url); then you're already in trouble, since evhttp_encode_uri() treats ? as special. And if you're building your query by saying something dumb like asprintf(&query, "?1=%s&q2=%s2", v1, v2); encoded_query = evhttp_encode_uri(query); then you're in trouble because evhttp_encode_uri() treats & as special. But what if somebody is saying something iffy like asprintf(&query1, "q1=%s", v1); asprintf(&query2, "q2=%s", v2); encoded1 = evhttp_encode_uri(query1); encoded2 = evhttp_encode_uri(query2); asprintf(&url, "http://example.com?%s&%s", encoded1, encoded2); ? If they were relying on the previous broken behavior of evhttp_encode_uri(), changing it to do the right thing will break them. Of course, their code is already broken if they were relying on evhttp_encode_uri() actually encoding + characters reliably, so they're not in good shape either way. I've looked through the first few pages of google codesearch results for evhttp_encode_uri, and not found anything that suggests someone is doing this broken-but-almost-working thing. So, time to go ahead and make this change? The affected characters are "!$'()*+,/:=@" [*] For example, if somebody gives you a string that looks like http://example.com/?a=b&c=d, and they tell you that it is a URI that needs encoding, then it's already too late to encode, since you can't tell whether they meant to have the query parameters be "a" -> "b" and "c" -> "d", or just "a" -> "b&c=d". Thus, you can't "URI-encode" a URI[**] [**] Unless you're URI-encoding the URI in order to to use it as a query parameter or something. yrs, -- Nick *********************************************************************** To unsubscribe, send an e-mail to majord...@freehaven.net with unsubscribe libevent-users in the body.