ftp(1) implements RFC 1738 from Dec. 1994 but RFC 2396 from Aug. 1998 updated this and the tl;dr is: do not encode the tilde character.
In theory, this shouldn't make a difference as servers should decode "%7e" to "~", BUT not all servers do so and thus some respond with 404. I've hit this in the past already but didn't look at the code. Now I came across this link that made me fix it: https://review.trustedfirmware.org/changes/TF-A%2Ftrusted-firmware-a~7726/revisions/9/patch?download curl(1) and wget(1) both fetch this file successfully, i.e. they behave identicall wrt. "%" and send it as-is: $ url='https://review.trustedfirmware.org/changes/TF-A%2Ftrusted-firmware-a~7726/revisions/9/patch?download' $ curl -I -sS -w '%{url}\n' "$url" | tail -1 200 https://review.trustedfirmware.org/changes/TF-A%2Ftrusted-firmware-a~7726/revisions/9/patch?download $ wget -d "$url" 2>&1 | grep -Ew 'GET|OK' GET /changes/TF-A%2Ftrusted-firmware-a~7726/revisions/9/patch?download HTTP/1.1 HTTP/1.1 200 OK 200 OK ftp(1) still sticks with RFC 1738 and encodes it: ftp -d "$url" host review.trustedfirmware.org, port https, path changes/TF-A%2Ftrusted-firmware-a~7726/revisions/9/patch?download, save as patch?download, auth none. Trying 64:ff9b::339f:1211... Requesting https://review.trustedfirmware.org/changes/TF-A%2Ftrusted-firmware-a~7726/revisions/9/patch?download GET /changes/TF-A%2Ftrusted-firmware-a%7e7726/revisions/9/patch?download HTTP/1.1 Connection: close Host: review.trustedfirmware.org User-Agent: OpenBSD ftp received 'HTTP/1.1 404 Not Found' ftp: Error retrieving https://review.trustedfirmware.org/changes/TF-A%2Ftrusted-firmware-a~7726/revisions/9/patch?download: 404 Not Found RFC 2396 2.4.2. When to Escape and Unescape says: In some cases, data that could be represented by an unreserved character may appear escaped; for example, some of the unreserved "mark" characters are automatically escaped by some systems. If the given URI scheme defines a canonicalization algorithm, then unreserved characters may be unescaped according to that algorithm. For example, "%7e" is sometimes used instead of "~" in an http URL path, but the two are equivalent for an http URL. So they're equivalent and tilde does not need encoding. Again, servers should always decode it anyway, but not all of them do, so better send it as-is/unencoded. Diff below is effectively a one-character change that removes "~" from `*unsafe_chars' but it also updates the comments to use RFC 2396 wording and order of characters for easier code-to-standard comparison. This allows me to fetch this patch: $ ./obj/ftp -d "$url" host review.trustedfirmware.org, port https, path changes/TF-A%2Ftrusted-firmware-a~7726/revisions/9/patch?download, save as patch?download, auth none. Trying 64:ff9b::339f:1211... Requesting https://review.trustedfirmware.org/changes/TF-A%2Ftrusted-firmware-a~7726/revisions/9/patch?download GET /changes/TF-A%2Ftrusted-firmware-a~7726/revisions/9/patch?download HTTP/1.1 Connection: close Host: review.trustedfirmware.org User-Agent: OpenBSD ftp received 'HTTP/1.1 200 OK' received 'Date: Fri, 05 Nov 2021 23:34:49 GMT' received 'Server: Apache' received 'X-Frame-Options: DENY' received 'Content-Disposition: attachment; filename="155911c.diff.base64"' received 'X-Content-Type-Options: nosniff' received 'Cache-Control: private' received 'Pragma: no-cache' received 'Expires: Mon, 01 Jan 1990 00:00:00 GMT' received 'X-FYI-Content-Encoding: base64' received 'X-FYI-Content-Type: application/mbox' received 'Content-Type: text/plain;charset=iso-8859-1' received 'Vary: Accept-Encoding' received 'Connection: close' received 'Transfer-Encoding: chunked' 9544 bytes received in 0.00 seconds (12.21 MB/s) Looking at wget's source somewhat backs up this way of handling tilde by explicitly mentioning broken "%7e" decoding, i.e. servers expecting an unencoded "~" as-is: >From wget-1.2.1/src/url.c 103f: /* Table of "reserved" and "unsafe" characters. Those terms are rfc1738-speak, as such largely obsoleted by rfc2396 and later specs, but the general idea remains. A reserved character is the one that you can't decode without changing the meaning of the URL. For example, you can't decode "/foo/%2f/bar" into "/foo///bar" because the number and contents of path components is different. Non-reserved characters can be changed, so "/foo/%78/bar" is safe to change to "/foo/x/bar". The unsafe characters are loosely based on rfc1738, plus "$" and ",", as recommended by rfc2396, and minus "~", which is very frequently used (and sometimes unrecognized as %7E by broken servers). An unsafe character is the one that should be encoded when URLs are placed in foreign environments. E.g. space and newline are unsafe in HTTP contexts because HTTP uses them as separator and line terminator, so they must be encoded to %20 and %0A respectively. "*" is unsafe in shell context, etc. We determine whether a character is unsafe through static table lookup. This code assumes ASCII character set and 8-bit chars. */ OK? Index: fetch.c =================================================================== RCS file: /cvs/src/usr.bin/ftp/fetch.c,v retrieving revision 1.205 diff -u -p -r1.205 fetch.c --- fetch.c 31 Aug 2021 09:51:25 -0000 1.205 +++ fetch.c 5 Nov 2021 22:49:35 -0000 @@ -106,14 +106,17 @@ static int redirect_loop; static int retried; /* - * Determine whether the character needs encoding, per RFC1738: - * - No corresponding graphic US-ASCII. - * - Unsafe characters. + * Determine whether the character needs encoding, per RFC2396. */ static int -unsafe_char(const char *c0) +to_encode(const char *c0) { - const char *unsafe_chars = " <>\"#{}|\\^~[]`"; + /* 2.4.3. Excluded US-ASCII Characters */ + const char *excluded_chars = + " " /* space */ + "<>#\"" /* delims (modulo "%", see below) */ + "{}|\\^[]`" /* unwise */ + ; const unsigned char *c = (const unsigned char *)c0; /* @@ -123,16 +126,15 @@ unsafe_char(const char *c0) return (iscntrl(*c) || !isascii(*c) || /* - * Unsafe characters. - * '%' is also unsafe, if is not followed by two + * '%' is also reserved, if is not followed by two * hexadecimal digits. */ - strchr(unsafe_chars, *c) != NULL || + strchr(excluded_chars, *c) != NULL || (*c == '%' && (!isxdigit(c[1]) || !isxdigit(c[2])))); } /* - * Encode given URL, per RFC1738. + * Encode given URL, per RFC2396. * Allocate and return string to the caller. */ static char * @@ -145,11 +147,10 @@ url_encode(const char *path) /* * First pass: - * Count unsafe characters, and determine length of the - * final URL. + * Count characters to encode and determine length of the final URL. */ for (i = 0; i < length; i++) - if (unsafe_char(path + i)) + if (to_encode(path + i)) new_length += 2; epath = epathp = malloc(new_length + 1); /* One more for '\0'. */ @@ -161,7 +162,7 @@ url_encode(const char *path) * Encode, and copy final URL. */ for (i = 0; i < length; i++) - if (unsafe_char(path + i)) { + if (to_encode(path + i)) { snprintf(epathp, 4, "%%" "%02x", (unsigned char)path[i]); epathp += 3;
