Re: [PATCH v4] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2.

Developers list for Guile, the GNU extensibility library Fri, 03 Nov 2023 11:12:48 -0700


Hi Vivien,


> This pushes the limits of my understanding of URIs, as I did not know
> we had to consider '%2E%2E' the same as '..'. However, the RFC is not
> very clear:

I wasn't able to find anything that MANDATED any normalization at all, either 
before or after Relative Resolution. It is possible that treating %2E as a 
literal dot in resolve-relative-reference could count as unwanted 
normalization. But it's a safe operation in terms of URI equivalence* and I 
think users would be less confused to have %2E%2E disappear than to have it 
remain.

Also, what if the resolve-relative-reference procedure didn't treat %2E as a 
dot?
There isn't a uri-normalize procedure users can call afterwards to fix that.
And there isn't a version of uri-decode that allows selectively decoding JUST 
the dot characters.
Users would have to write a lot of code themselves to get proper 
relative-resolution, so we should do it for them.


- Nathan

*References for the claim that treating %2E as a literal dot is always okay:
- Section 2.3: percent-encoded unreserved characters are always equivalent to 
decoded ones.
- Section 2.4: unreserved characters can be percent-decoded at any time.
- Section 6.2.2.3: dot-segments should be removed during normalization even if 
found outside of a relative-reference.

Vivien Kraus <viv...@planete-kraus.eu> writes:

> Hello Natan!
>
> Le jeudi 02 novembre 2023 à 16:00 -0400, Nathan a écrit :
>> There is a problem and I fixed it by rewriting a bunch of code myself
>> because I need similar code.
>
> Thank you!
>
>> remove-dot-segments:
>> You cannot split-and-decode-uri-path and then encode-and-join-uri-
>> path.
>> Those are terrible functions that don't work on all URIs.
>> URI schemes are allowed to specify that certain reserved characters
>> (sub-delims) are special.
>> In that case, a sub-delim that IS escaped is different from a sub-
>> delim that IS NOT escaped.
>> 
>> Example input to your remove-dot-segments:
>> (resolve-relative-reference (string->uri-reference "/") (string->uri-
>> reference "excitement://a.com/a!a!%21!"))
>> Your wrong output:
>> excitement://a.com/a%21a%21%21%21
>
> I see.
>
>> 
>> One solution would be to only percent-decode dots. Because dot is
>> unreserved, that solution doesn't have any URI equivalence issues.
>> But I still think decoding dots automatically is a bad, unexpected
>> side-effect to have.
>> I rewrote this function so that it:
>> - works on both escaped and unescaped dots
>> - doesn't unescape any unnecessary characters
>
> This pushes the limits of my understanding of URIs, as I did not know
> we had to consider '%2E%2E' the same as '..'. However, the RFC is not
> very clear:
>
> 2.3: Unreserved Characters:
>    For consistency, percent-encoded octets in the ranges of ALPHA
>    (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E),
>    underscore (%5F), or tilde (%7E) should not be created by URI
>    producers and, when found in a URI, should be decoded to their
>    corresponding unreserved characters by URI normalizers.
>
> 5.2.1: Pre-parse the Base URI:
>    Normalization of the base URI, as described in Sections 6.2.2 and
>    6.2.3, is optional.  A URI reference must be transformed to its
>    target URI before it can be normalized.
>
> Did you find something more precise than that?  In any case, decoding
> the dots is probably the least unsafe thing to do.
>
>> 
>> The test suite no longer needs to check for incorrect output either:
>> > ;; The test suite checks for ';' characters, but Guile escapes
>> > ;; them in URIs. Same for '='.
>> 
>> ----
>> 
>> resolve-relative-reference:
>> I rewrote this procedure so it is shorter.
>> I also added #:strict? to toggle "strict parser" as mentioned in the
>> RFC.
>
> As far as I understand, your code is correct. The tests pass.
>
> Thank you again!
>
> Best regards,
>
> Vivien

Re: [PATCH v4] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2.

Reply via email to