Ok, sorry about the initial confusion. I thought we emitted %20 and you
were proposing to emit +. So I was basically arguing in favor of your
change except I didn't know it. :)

Here is my hopefully correct e-reply: encode_www_form (which is what is
used by encode_query) is not specified by RFC3986. So at best, we need to
clarify that these functions are not part of RFC3986.

RFC3986 does allow + in query strings but it does not say anything about
encoding/decoding it. The query string is ultimately up to the
interpretation of the underlying application. Even the common key=value
mechanism is hinted but not asserted. For example, someone could use a
query string where the meaning of & and = are replaced and that's fine.

So while encode_query may not violate RFC3986, it definitely doesn't follow
RFC3986. Encoding spaces to %20 would definitely be a better take. To make
matters worse, we cannot simply replace URI.encode_www_form by URI.encode
because URI.encode does not handle #, which MUST be escaped in query
strings.

I think the best option at the moment is to deprecate encode_query and
introduce encode_www_query. We should also introduce a new function that
encodes it according to the RFC interpretation of query strings but I am
not sure what to call it. Suggestions?


On Fri, Jan 29, 2021 at 4:27 PM José Valim <[email protected]> wrote:

> Gah, I am so sorry. I have been working on the wrong assumption that
> URI.encode_query was escaping space to %20 but it is encoding it to +,
> which was your point all along. Yes, escaping it to + is not in
> accordance to RFC3986.
>
> I will re-read your original e-mail and address it accordingly now. Once
> again, apologies.
>
>
> On Fri, Jan 29, 2021 at 4:13 PM José Valim <[email protected]> wrote:
>
>> > If I read this correctly, than given what you write, the current
>> `URI.encode_query/1` implementation _is_ in violation of RFC3986. Example:
>>
>> You can’t compare the result of URI.encode with URI.encode_query because
>> they are meant to escape different parts of an URI and different parts use
>> different rules. They are both in accordance to the RFC though.
>>
>> The assumption that all of a URL needs to be escaped with URI.encode is
>> incorrect. if we encode a query parameter with URI.encode, that will be
>> the wrong result.
>>
>> Plus, the Wikipedia article explicitly mentions that escaping space as +
>> is a difference to RFC3986, which confirms my assumption that RFC3986 does
>> not mention whitespace *in query params* should be encoded as +:
>>
>> > The encoding of SPACE as '+' and the selection of "as-is" characters
>> distinguishes this encoding from RFC 3986
>> <https://tools.ietf.org/html/rfc3986>.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4%2BviGdPw82X_KXghXNY7J7Q0yA%2BK2PepcQH_%3DTDnhmcYQ%40mail.gmail.com.

Reply via email to