Hi José, Thanks for the quick reply! To be sure, generally speaking, I certainly see what you say about "a bit more complicated than expected".
A couple of sentences are a bit confusing to me, and make me wonder whether we've understood each other correctly. I'll explain in context of your reply: On Friday, January 29, 2021 at 12:42:09 PM UTC+1 José Valim wrote: > Hi Floris, thanks for the proposal! > > Unfortunately this is a bit more complicated than expected. > > First of all, the current implementation is not in violation of RFC3986. > %20 is a valid escaping of spaces in query strings. As far as I know, > RFC3986 does not explicitly mention that + is equivalent to a space. > Furthermore, earlier specifications, such as RFC2396 > <https://www.ietf.org/rfc/rfc2396.txt>, did not allow + in query strings > at all. > If I read this correctly, than given what you write, the current `URI.encode_query/1` implementation _is_ in violation of RFC3986. Example: iex(1)> URI.encode_query(%{"key" => "two words"}) "key=two+words" iex(2)> URI.encode("key=two words") "key=two%20words" As you can see, `encode_query` converts " " to "+", while `encode` converts " " to "%20". This violation of RFC3986 is the main reason for my proposal, as I'd like Elixir to be able to at least comply with this spec (albeit as opt-in for backward compatibility). > Only later on W3C specified that + is reserved to mean spaces > <https://www.w3.org/Addressing/URL/uri-spec.html> to be compatible with > the general usage of URLs - which browsers eventually standardized on. > However, at this point, the damage was done. For example, for mailto links, > your mail client may not rewrite + to spaces, while it will certainly > handle percent encoded spaces. > Yes, thanks! This is yet another reason to at least _allow_ `URI.encode_query` to encode spaces to "%20". Site note: on further research I've now found there to be a difference between a "normal" URL and a URL generated by a GET form submit. Although not authorative, this section explains very well and links to some original specifications: https://en.wikipedia.org/wiki/Query_string#URL_encoding Bottom line is that browser are instructed to use "+" to encode spaces when coverting GET form fields into a URL. In practice, this leads to the quite wonky situation that the generated URL (containing those encoded form field values as query string parameters!) will by definition _not_ comply to RFC3986, which contains the specification for... URL encoding. > In other words, if you want to guarantee the space will be treated as > space, %20 is the best choice. So I would say String.replace/3 is the way > to go unless there is a quote in RFC3986 which suggests or advocates for > using + as the escaping of spaces in query strings. > Yes! Again, given what you say in the first sentence here, that was the main reasoning for proposing RFC3986-compliant encode_query behaviour. > Finally, it is important to remember that escaping of URI segments for > paths and query strings use distinct algorithms > <http://stephane.epardaud.fr/articles/2009-02-03-what-every-web-developer-must-know-about-url-encoding.html#Thereservedcharactersarenotwhatyouthinktheyare> > > (which is why the function is called encode_query). > Good to know! Thanks again for the quick and detailed reply. I hope this message will bring us a bit closer to clarity, instead of inadvertently adding even more confusion to the in itself already quite confusing matter of URI encoding :) – Floris -- You received this message because you are subscribed to the Google Groups "elixir-lang-core" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/9bd48b8f-93ad-40b5-aca4-804b63f77965n%40googlegroups.com.
