Hi José,

Thanks for the quick reply! To be sure, generally speaking, I certainly see 
what you say about "a bit more complicated than expected".

A couple of sentences are a bit confusing to me, and make me wonder whether 
we've understood each other correctly. I'll explain in context of your 
reply:

On Friday, January 29, 2021 at 12:42:09 PM UTC+1 José Valim wrote:

> Hi Floris, thanks for the proposal!
>
> Unfortunately this is a bit more complicated than expected.
>
> First of all, the current implementation is not in violation of RFC3986. 
> %20 is a valid escaping of spaces in query strings. As far as I know, 
> RFC3986 does not explicitly mention that + is equivalent to a space. 
> Furthermore, earlier specifications, such as RFC2396 
> <https://www.ietf.org/rfc/rfc2396.txt>, did not allow + in query strings 
> at all.
>

If I read this correctly, than given what you write, the current 
`URI.encode_query/1` implementation _is_ in violation of RFC3986. Example:

iex(1)> URI.encode_query(%{"key" => "two words"}) 
"key=two+words"
iex(2)> URI.encode("key=two words")
"key=two%20words"
 
As you can see, `encode_query` converts " " to "+", while `encode` converts 
" " to "%20".
This violation of RFC3986 is the main reason for my proposal, as I'd like 
Elixir to be able to at least comply with this spec (albeit as opt-in for 
backward compatibility).


> Only later on W3C specified that + is reserved to mean spaces 
> <https://www.w3.org/Addressing/URL/uri-spec.html> to be compatible with 
> the general usage of URLs - which browsers eventually standardized on. 
> However, at this point, the damage was done. For example, for mailto links, 
> your mail client may not rewrite + to spaces, while it will certainly 
> handle percent encoded spaces.
>

Yes, thanks! This is yet another reason to at least _allow_ 
`URI.encode_query` to encode spaces to "%20".

Site note: on further research I've now found there to be a difference 
between a "normal" URL and a URL generated by a GET form submit. Although 
not authorative, this section explains very well and links to some original 
specifications:

https://en.wikipedia.org/wiki/Query_string#URL_encoding

Bottom line is that browser are instructed to use "+" to encode spaces when 
coverting GET form fields into a URL. In practice, this leads to the quite 
wonky situation that the generated URL (containing those encoded form field 
values as query string parameters!) will by definition _not_ comply to 
RFC3986, which contains the specification for... URL encoding.


> In other words, if you want to guarantee the space will be treated as 
> space, %20 is the best choice. So I would say String.replace/3 is the way 
> to go unless there is a quote in RFC3986 which suggests or advocates for 
> using + as the escaping of spaces in query strings.
>

Yes! Again, given what you say in the first sentence here, that was the 
main reasoning for proposing RFC3986-compliant encode_query behaviour.


> Finally, it is important to remember that escaping of URI segments for 
> paths and query strings use distinct algorithms 
> <http://stephane.epardaud.fr/articles/2009-02-03-what-every-web-developer-must-know-about-url-encoding.html#Thereservedcharactersarenotwhatyouthinktheyare>
>  
> (which is why the function is called encode_query).
>

Good to know! 

Thanks again for the quick and detailed reply. I hope this message will 
bring us a bit closer to clarity, instead of inadvertently adding even more 
confusion to the in itself already quite confusing matter of URI encoding :)

– Floris

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/9bd48b8f-93ad-40b5-aca4-804b63f77965n%40googlegroups.com.

Reply via email to