On 2022-10-23 06:10, Grant Taylor wrote:
On 10/21/22 11:30 PM, Amos Jeffries wrote:
Not just convention. AFAICT was formally registered with W3C, before
everyone went to using IETF for registrations.
Please elaborate on what was formally registered. I've only seen 3128
/ 3129 be the default for Squid (and a few things emulating squid).
Other proxies of the time, namely Netscape's and Microsoft's
counterparts, tended to use 8080.
I'd genuinely like to learn more about and understand the history /
etymology / genesis of the 3128 / 3129.
Duane W. would be the best one to ask about the details.
What I know is that some 10-12 years ago I discovered an message by
Duane mentioning that W3C had (given or accepted) port 3128 for Squid
use. I've checked the squid-cache archives and not seeing the message.
Right now it looks like the W3C changed their systems and only track the
standards documents. So I cannot reference their (outdated?) protocol
registry :-{ . Also checked the squid-cache archives and not finding it
email history. Sorry.
FYI, discussion started ~30 years ago.
ACK
The problem:
For bandwidth savings HTTP/1.0 defined different URL syntax for origin
and relay/proxy requests. The form sent to an origin server lacks any
information about the authority. That was expected to be known
out-of-band by the origin itself.
HTTP/1.1 has attempted several different mechanisms to fix this over
the years. None of them has been universally accepted, so the problem
remains. The best we have is mandatory Host header which most (but
sadly not all) clients and servers use.
HTTP/2 cements that design with mandatory ":authority" pseudo-header
field. So the problem is "fixed"for native HTTP/2+ traffic. But until
HTTP/1.0 and broken HTTP/1.1 clients are all gone the issue will still
crop up.
I'm not entirely sure what you mean by "the authority". I'm taking it
to mean the identity of the service that you are wanting content from.
The Host: header comment with HTTP/1.1 is what makes me think this.
I mean "authority" as used by HTTP specification, which refers to
https://www.rfc-editor.org/rfc/rfc3986#section-3.2
My understanding is that neither HTTP/0.9 nor HTTP/1.0 had a Host:
header and that it was assumed that the IP address you were connecting
to conveyed the server that you were wanting to connect to.
Yes exactly. That is the source of the problem, perpetuated by the need
to retain on-wire byte/octet backward compatibility until HTTP/2 changed
to binary format.
Consider what the proxy has to do when (not if) the IP:port being
connected to are that proxy's (eg localhost:80) and the URL is only a
path ("/") on an origin server somewhere else. Does the "GET / HTTP/1.0"
mean "http://example.com/" or "http://example.net/" ?
More importantly the proxy hostname:port the client is opening TCP
connections to may be different from the authority-info specified in
the HTTP request message (or lack thereof).
My working understanding of what the authority is seems to still work
with this.
The key point is that the proxy host:port and the origin host:port are
two different authority and only the origin may be passed along in the
URL (or URL+Host header). When the client uses port 80 and 443 thinking
they are origin services it is *required* (per
https://www.rfc-editor.org/rfc/rfc9112.html#name-origin-form) to omit
the real origins info. Enter problems.
This crosses security boundaries and involves out-of-band information
sources at all three endpoints involved in the transaction for the
message semantics and protocol negotiations to work properly.
I feel like the nature of web traffic tends to frequently, but not
always, cross security / administrative boundaries. As such, I don't
think that existence of proxies in the communications path alters
things much.
Please elaborate on what out-of-band information you are describing.
The most predominant thing that comes to mind, particularly with
HTTP/1.1 and HTTP/2 is name resolution -- ostensibly DNS -- to identify
the IP address to connect to.
I refer to all the many ways the clients may be explicitly or implicitly
configured to be aware that it is talking to a proxy - such that it
explicitly avoids sending the problematic origin-form URLs.
What that text does not say is that when they are omitted by the
**user** they are taken from configuration settings in the OS:
* the environment variable name provides:
- the protocol name ("http" or "HTTPS", aka plain-text or
encrypted)
- the expected protocol syntax/semantics ("proxy" aka
forward-proxy)
* the machine /etc/services configuration provides the default port
for the named protocol.
Ergo the use of /default/ values when values are not specified.
The defaults though are tuned for origin server (or reverse-proxy)
direct contact.
No Browser I know supports
"http-alt://proxy.example.com?http://origin.example.net/index.html"
URLs.
I feel like this in a round about way supports my stance that the
default ports are perfectly fine to use.
... "at your own risk" they technically might be. So long as you only
receive one of the three types of syntax there - port 80/443 being
officially registered for origin / reverse-proxy syntax.
Attempting to use a reverse-proxy or origin server such a
configuration may work for some messages, but **will** fail due to
syntax or semantic errors on others.
I question the veracity of that statement.
It is based on experience. Squid used to be a lot more lenient and tried
for decades to do the syntax auto-detection. The path from that to
separate ports is littered with CVEs. Most notably the curse that keeps
on giving: CVE-2009-0801, which is just the trigger issue for a whole
nest of bad side effects.
Amos
_______________________________________________
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users