Hi! Are there other things to fix?
Best regards, Vivien Le lundi 25 septembre 2023 à 18:48 +0200, Vivien Kraus a écrit : > * module/web/uri.scm (remove-dot-segments): Implement algorithm > 5.2.4. > (merge-paths): Implement algorithm 5.2.3. > (resolve-relative-reference): Implement algorithm 5.2.2. > (module): Export resolve-relative-reference. > * NEWS: Reference it here. > * doc/ref/web.texi (URIs): Document it here. > (Subtypes of URI): Add a @node declaration to cross-reference it. > (HTTP Headers) [location]: Point to the section for different URI > types. > (Web Client) [http-request]: Indicate that no redirection is > performed. > --- > > I clarified the situation about redirections. I don’t think it’s > Guile’s > job to do it. For permanent redirections (301), the application > developer is supposed to edit the pages that point to the now-moved > resource anyway. A handful of security issues must also be lurking in > the shadows, and I don’t think it should be a responsibility for the > Guile web client. > > The specification uses the word "relative" both for the type of URI > that > is most likely to be found, and to express the asymmetric relation > between both arguments of the algorithm. I think "base" and > "dependent" > are clearer, what do you think? > > The semicolon and equal sign are both reserved characters, so it’s > expected that Guile escapes them. If there’s a bug, it is in the 5.4 > section of the RFC. However, I understand that it would be desirable > for > the algorithm to accept such unescaped characters, since it works > with > URIs in isolation and not in an HTTP frame or web page. > > NEWS | 7 ++ > doc/ref/web.texi | 27 +++++- > module/web/uri.scm | 161 > +++++++++++++++++++++++++++++++++- > test-suite/tests/web-uri.test | 68 ++++++++++++++ > 4 files changed, 261 insertions(+), 2 deletions(-) > > diff --git a/NEWS b/NEWS > index b319404d7..bdf75cb3c 100644 > --- a/NEWS > +++ b/NEWS > @@ -9,6 +9,13 @@ Changes in 3.0.10 (since 3.0.9) > > * New interfaces and functionality > > +** New function in (web uri): resolve-relative-reference > + > +Implement the /5.2. Relative Resolution/ algorithm in RFC 3986. It > may > +be used to request a moved resource in case of a 301 or 302 HTTP > +response, by resolving the Location value of the response on top of > the > +requested URI. > + > ** New warning: unused-module > > This analysis, enabled at `-W2', issues warnings for modules that > appear > diff --git a/doc/ref/web.texi b/doc/ref/web.texi > index 607c855b6..2267c9774 100644 > --- a/doc/ref/web.texi > +++ b/doc/ref/web.texi > @@ -297,6 +297,7 @@ For example, the list @code{("scrambled eggs" > "biscuits&gravy")} encodes > as @code{"scrambled%20eggs/biscuits%26gravy"}. > @end deffn > > +@node Subtypes of URI > @subsubheading Subtypes of URI > > As we noted above, not all URI objects have a scheme. You might > have > @@ -356,6 +357,25 @@ Parse @var{string} into a URI object, while > asserting that no scheme is > present. Return @code{#f} if the string could not be parsed. > @end deffn > > +@cindex resolve URI reference > +In order to get a URI object from a base URI and a relative > reference, > +one has to use a @dfn{relative URI reference resolution} algorithm. > For > +instance, given a base URI, @samp{https://example.com/over/here}, > and a > +relative reference, @samp{../no/there}, it may seem easy to get an > +absolute URI as @samp{https://example.com/over/../no/there}. It is > +possible that the server at @samp{https://example.com} could serve > the > +same resource under this URL as > +@samp{https://example.com/no/there}. However, a web cache, or a > linked > +data processor, must understand that the relative reference > resolution > +leads to @samp{https://example.com/no/there}. > + > +@deffn {Scheme procedure} resolve-relative-reference @var{base} > @var{dependent} > +Return a URI object representing @var{dependent}, using the > components > +of @var{base} if missing, as defined in section 5.2 in RFC 3986. > This > +function cannot return a relative reference (it can only return an > +absolute URI object), if either @var{base} or @var{dependent} is an > +absolute URI object. > +@end deffn > > @node HTTP > @subsection The Hyper-Text Transfer Protocol > @@ -1038,7 +1058,8 @@ The entity-tag of the resource. > @deftypevr {HTTP Header} URI-reference location > A URI reference on which a request may be completed. Used in > combination with a redirecting status code to perform client-side > -redirection. > +redirection. @xref{Subtypes of URI, the distinction between types of > +URI}, for more information on relative references. > @example > (parse-header 'location "http://example.com/other") > @result{} #<uri ...> > @@ -1501,6 +1522,10 @@ constants, such as @code{certificate- > status/signer-not-found} or > Connect to the server corresponding to @var{uri} and make a request > over > HTTP, using @var{method} (@code{GET}, @code{HEAD}, @code{POST}, > etc.). > > +@code{http-request} does not follow redirections. If a redirection > is > +required, @code{http-request} returns a response object with an > adequate > +response code (e.g. 301 or 302). > + > The following keyword arguments allow you to modify the requests in > various ways, for example attaching a body to the request, or > setting > specific headers. The following table lists the keyword arguments > and > diff --git a/module/web/uri.scm b/module/web/uri.scm > index 8e0b9bee7..acec2d1e8 100644 > --- a/module/web/uri.scm > +++ b/module/web/uri.scm > @@ -47,7 +47,9 @@ > > uri-reference? relative-ref? > build-uri-reference build-relative-ref > - string->uri-reference string->relative-ref)) > + string->uri-reference string->relative-ref > + > + resolve-relative-reference)) > > (define-record-type <uri> > (make-uri scheme userinfo host port path query fragment) > @@ -501,3 +503,160 @@ strings, and join the parts together with ‘/’ > as a delimiter. > For example, the list ‘(\"scrambled eggs\" \"biscuits&gravy\")’ > encodes as ‘\"scrambled%20eggs/biscuits%26gravy\"’." > (string-join (map uri-encode parts) "/")) > + > +(define (remove-dot-segments path) > + "Remove the @samp{./} and @samp{../} segments in @var{path}, as > + RFC3986, section 5.2.4." > + (let scan ((input > + (let ((components (split-and-decode-uri-path path))) > + (if (string-suffix? "/" path) > + `(,@components "") > + components))) > + (input-path-absolute? (string-prefix? "/" path)) > + (output '()) > + (output-absolute? #f) > + (output-ends-in-/? (string-suffix? "/" path))) > + (cond > + ((and input-path-absolute? > + (null? input)) > + ;; Transfer the initial "/" from the input to the end of the > + ;; output. > + (scan '() #f output output-absolute? #t)) > + ((null? input) > + (string-append > + (if output-absolute? "/" "") > + (encode-and-join-uri-path > + (reverse output)) > + (if output-ends-in-/? "/" ""))) > + ((and (not input-path-absolute?) > + (or (equal? (car input) "..") > + (equal? (car input) "."))) > + (scan (cdr input) #f output output-absolute? output-ends-in- > /?)) > + ((and input-path-absolute? > + (equal? (car input) ".")) > + (scan (cdr input) #t output output-absolute? output-ends-in- > /?)) > + ((and input-path-absolute? > + (equal? (car input) "..")) > + (scan (cdr input) #t > + (if (null? output) > + output > + (cdr output)) > + ;; Remove the last segment, including the preceding /. > So, > + ;; if there is 0 or 1 segment, remove the root / too. > + (if (or (null? output) (null? (cdr output))) > + #f ;; remove the / > + #t) ;; keep it > + #f)) > + (else > + (scan (cdr input) > + ;; If there is only 1 item in input, then it does not > end in > + ;; /, so the recursive call does not start with > + ;; /. Otherwise, the recursive call starts with /. > + (not (null? (cdr input))) > + (cons (car input) output) > + ;; If the output is empty and the input path is > absolute, > + ;; the / of the transferred path is transferred as well. > + (or output-absolute? > + (and (null? output) > + input-path-absolute?)) > + #f))))) > + > +(define (merge-paths base-has-authority? base dependent) > + "Return @samp{@var{base}/@var{dependent}}, with the subtelties of > absolute > + paths explained in RFC3986, section 5.2.3. If the base URI has an > +authority (userinfo, host, port), then the processing is a bit > +different." > + (if (and base-has-authority? > + (equal? base "")) > + (string-append "/" dependent) > + (let ((last-/ (string-rindex base #\/))) > + (if last-/ > + (string-append (substring base 0 last-/) "/" dependent) > + dependent)))) > + > +(define (resolve-relative-reference base dependent) > + "Resolve @var{dependent} on top of @var{base}, as RFC3986, section > +5.2. Both @var{dependent} and @var{base} may be URI or relative > +references. The return value is a URI if either @var{dependent} or > +@var{base} is a URI." > + ;; As opposed to RFC 3986, we use "dependent" instead of > "relative" to > + ;; avoid confusion between "URI" and "relative reference", the > + ;; dependent URI may be either. > + (let ((b-scheme (uri-scheme base)) > + (b-userinfo (uri-userinfo base)) > + (b-host (uri-host base)) > + (b-port (uri-port base)) > + (b-path (uri-path base)) > + (b-query (uri-query base)) > + (b-fragment (uri-fragment base)) > + (r-scheme (uri-scheme dependent)) > + (r-userinfo (uri-userinfo dependent)) > + (r-host (uri-host dependent)) > + (r-port (uri-port dependent)) > + (r-path (uri-path dependent)) > + (r-query (uri-query dependent)) > + (r-fragment (uri-fragment dependent)) > + (t-scheme #f) > + (t-userinfo #f) > + (t-host #f) > + (t-port #f) > + (t-path "") > + (t-query #f) > + (t-fragment #f)) > + ;; https://www.rfc-editor.org/rfc/rfc3986#section-5.2 > + > + ;;The programming style uses mutations to better adhere to the > + ;;algorithm specification. > + (if r-scheme > + (begin > + (set! t-scheme r-scheme) > + (set! t-userinfo r-userinfo) > + (set! t-host r-host) > + (set! t-port r-port) > + (set! t-path (remove-dot-segments r-path)) > + (set! t-query r-query)) > + ;; r-scheme is not defined: > + (begin > + (if r-host > + (begin > + (set! t-userinfo r-userinfo) > + (set! t-host r-host) > + (set! t-port r-port) > + (set! t-path (remove-dot-segments r-path)) > + (set! t-query r-query)) > + ;; r-scheme is not defined, r-authority is not > defined: > + (begin > + (if (equal? r-path "") > + (begin > + (set! t-path b-path) > + (if r-query > + ;; r-scheme, r-authority, r-path are not > + ;; defined: > + (set! t-query r-query) > + ;; r-scheme, r-authority, r-path, r-query > are > + ;; not defined: > + (set! t-query b-query))) > + ;; r-scheme, r-authority not defined, r-path > defined: > + (begin > + (if (string-prefix? "/" r-path) > + ;; r-scheme, r-authority not defined, r- > path > + ;; absolute: > + (set! t-path (remove-dot-segments r-path)) > + ;; r-scheme, r-authority not defined, r- > path > + ;; dependent: > + (set! t-path > + (remove-dot-segments > + (merge-paths b-host b-path r- > path)))) > + (set! t-query r-query))) > + (set! t-userinfo b-userinfo) > + (set! t-host b-host) > + (set! t-port b-port))) > + (set! t-scheme b-scheme))) > + (set! t-fragment r-fragment) > + (build-uri-reference #:scheme t-scheme > + #:userinfo t-userinfo > + #:host t-host > + #:port t-port > + #:path t-path > + #:query t-query > + #:fragment t-fragment))) > diff --git a/test-suite/tests/web-uri.test b/test-suite/tests/web- > uri.test > index 95fd82f16..c453bf60f 100644 > --- a/test-suite/tests/web-uri.test > +++ b/test-suite/tests/web-uri.test > @@ -20,6 +20,7 @@ > (define-module (test-web-uri) > #:use-module (web uri) > #:use-module (ice-9 regex) > + #:use-module (ice-9 string-fun) > #:use-module (test-suite lib)) > > > @@ -693,3 +694,70 @@ > (pass-if (equal? "foo%20bar" (uri-encode "foo bar"))) > (pass-if (equal? "foo%0A%00bar" (uri-encode "foo\n\x00bar"))) > (pass-if (equal? "%3C%3E%5C%5E" (uri-encode "<>\\^")))) > + > +(with-test-prefix "resolve relative reference" > + ;; Test suite in RFC3986, section 5.4. > + (let ((base (string->uri "http://a/b/c/d;p?q")) > + (equal/encoded? > + ;; The test suite checks for ';' characters, but Guile > escapes > + ;; them in URIs. Same for '='. > + (let ((escape-colon > + (lambda (x) > + (string-replace-substring x ";" "%3B"))) > + (escape-equal > + (lambda (x) > + (string-replace-substring x "=" "%3D")))) > + (lambda (x y) > + (equal? (escape-colon (escape-equal x)) > + (escape-colon (escape-equal y))))))) > + (let ((resolve > + (lambda (relative) > + (let* ((relative-uri > + (string->uri-reference relative)) > + (resolved-uri > + (resolve-relative-reference base relative-uri)) > + (resolved (uri->string resolved-uri))) > + resolved)))) > + (with-test-prefix "normal" > + (pass-if (equal/encoded? (resolve "g:h") "g:h")) > + (pass-if (equal/encoded? (resolve "g") "http://a/b/c/g")) > + (pass-if (equal/encoded? (resolve "./g") "http://a/b/c/g")) > + (pass-if (equal/encoded? (resolve "g/") "http://a/b/c/g/")) > + (pass-if (equal/encoded? (resolve "/g") "http://a/g")) > + (pass-if (equal/encoded? (resolve "//g") "http://g")) > + (pass-if (equal/encoded? (resolve "?y") > "http://a/b/c/d;p?y")) > + (pass-if (equal/encoded? (resolve "g?y") > "http://a/b/c/g?y")) > + (pass-if (equal/encoded? (resolve "#s") > "http://a/b/c/d;p?q#s")) > + (pass-if (equal/encoded? (resolve "g?y#s") > "http://a/b/c/g?y#s")) > + (pass-if (equal/encoded? (resolve ";x") "http://a/b/c/;x")) > + (pass-if (equal/encoded? (resolve "g;x?y#s") > "http://a/b/c/g;x?y#s")) > + (pass-if (equal/encoded? (resolve "") "http://a/b/c/d;p?q")) > + (pass-if (equal/encoded? (resolve ".") "http://a/b/c/")) > + (pass-if (equal/encoded? (resolve "./") "http://a/b/c/")) > + (pass-if (equal/encoded? (resolve "..") "http://a/b/")) > + (pass-if (equal/encoded? (resolve "../") "http://a/b/")) > + (pass-if (equal/encoded? (resolve "../g") "http://a/b/g")) > + (pass-if (equal/encoded? (resolve "../..") "http://a/")) > + (pass-if (equal/encoded? (resolve "../../") "http://a/")) > + (pass-if (equal/encoded? (resolve "../../g") "http://a/g"))) > + (with-test-prefix "abnormal" > + (pass-if (equal/encoded? (resolve "../../../g") > "http://a/g")) > + (pass-if (equal/encoded? (resolve "../../../../g") > "http://a/g")) > + (pass-if (equal/encoded? (resolve "/./g") "http://a/g")) > + (pass-if (equal/encoded? (resolve "/../g") "http://a/g")) > + (pass-if (equal/encoded? (resolve "g.") "http://a/b/c/g.")) > + (pass-if (equal/encoded? (resolve ".g") "http://a/b/c/.g")) > + (pass-if (equal/encoded? (resolve "g..") > "http://a/b/c/g..")) > + (pass-if (equal/encoded? (resolve "..g") > "http://a/b/c/..g")) > + (pass-if (equal/encoded? (resolve "./../g") "http://a/b/g")) > + (pass-if (equal/encoded? (resolve "./g/.") > "http://a/b/c/g/")) > + (pass-if (equal/encoded? (resolve "g/./h") > "http://a/b/c/g/h")) > + (pass-if (equal/encoded? (resolve "g/../h") > "http://a/b/c/h")) > + (pass-if (equal/encoded? (resolve "g;x=1/./y") > "http://a/b/c/g;x=1/y")) > + (pass-if (equal/encoded? (resolve "g;x=1/../y") > "http://a/b/c/y")) > + (pass-if (equal/encoded? (resolve "g?y/./x") > "http://a/b/c/g?y/./x")) > + (pass-if (equal/encoded? (resolve "g?y/../x") > "http://a/b/c/g?y/../x")) > + (pass-if (equal/encoded? (resolve "g#s/./x") > "http://a/b/c/g#s/./x")) > + (pass-if (equal/encoded? (resolve "g#s/../x") > "http://a/b/c/g#s/../x")) > + (pass-if (equal/encoded? (resolve "http:g") "http:g")))))) > + > > base-commit: 8441d8ff5671db690eb239cfea4dcfdee6d6dcdb