On 2022-10-23 06:10, Grant Taylor wrote:
On 10/21/22 11:30 PM, Amos Jeffries wrote:
Not just convention. AFAICT was formally registered with W3C, before everyone went to using IETF for registrations.

Please elaborate on what was formally registered. I've only seen 3128 / 3129 be the default for Squid (and a few things emulating squid). Other proxies of the time, namely Netscape's and Microsoft's counterparts, tended to use 8080.

I'd genuinely like to learn more about and understand the history / etymology / genesis of the 3128 / 3129.

Duane W. would be the best one to ask about the details.

What I know is that some 10-12 years ago I discovered an message by Duane mentioning that W3C had (given or accepted) port 3128 for Squid use. I've checked the squid-cache archives and not seeing the message.

Right now it looks like the W3C changed their systems and only track the standards documents. So I cannot reference their (outdated?) protocol registry :-{ . Also checked the squid-cache archives and not finding it email history. Sorry.



FYI, discussion started ~30 years ago.

ACK

The problem:

For bandwidth savings HTTP/1.0 defined different URL syntax for origin and relay/proxy requests. The form sent to an origin server lacks any information about the authority. That was expected to be known out-of-band by the origin itself.

HTTP/1.1 has attempted several different mechanisms to fix this over the years. None of them has been universally accepted, so the problem remains. The best we have is mandatory Host header which most (but sadly not all) clients and servers use.

HTTP/2 cements that design with mandatory ":authority" pseudo-header field. So the problem is "fixed"for native HTTP/2+ traffic. But until HTTP/1.0 and broken HTTP/1.1 clients are all gone the issue will still crop up.

I'm not entirely sure what you mean by "the authority". I'm taking it to mean the identity of the service that you are wanting content from. The Host: header comment with HTTP/1.1 is what makes me think this.


I mean "authority" as used by HTTP specification, which refers to https://www.rfc-editor.org/rfc/rfc3986#section-3.2


My understanding is that neither HTTP/0.9 nor HTTP/1.0 had a Host: header and that it was assumed that the IP address you were connecting to conveyed the server that you were wanting to connect to.

Yes exactly. That is the source of the problem, perpetuated by the need to retain on-wire byte/octet backward compatibility until HTTP/2 changed to binary format.

Consider what the proxy has to do when (not if) the IP:port being connected to are that proxy's (eg localhost:80) and the URL is only a path ("/") on an origin server somewhere else. Does the "GET / HTTP/1.0" mean "http://example.com/"; or "http://example.net/"; ?



More importantly the proxy hostname:port the client is opening TCP connections to may be different from the authority-info specified in the HTTP request message (or lack thereof).

My working understanding of what the authority is seems to still work with this.


The key point is that the proxy host:port and the origin host:port are two different authority and only the origin may be passed along in the URL (or URL+Host header). When the client uses port 80 and 443 thinking they are origin services it is *required* (per https://www.rfc-editor.org/rfc/rfc9112.html#name-origin-form) to omit the real origins info. Enter problems.


This crosses security boundaries and involves out-of-band information sources at all three endpoints involved in the transaction for the message semantics and protocol negotiations to work properly.

I feel like the nature of web traffic tends to frequently, but not always, cross security / administrative boundaries. As such, I don't think that existence of proxies in the communications path alters things much.

Please elaborate on what out-of-band information you are describing. The most predominant thing that comes to mind, particularly with HTTP/1.1 and HTTP/2 is name resolution -- ostensibly DNS -- to identify the IP address to connect to.


I refer to all the many ways the clients may be explicitly or implicitly configured to be aware that it is talking to a proxy - such that it explicitly avoids sending the problematic origin-form URLs.


What that text does not say is that when they are omitted by the **user** they are taken from configuration settings in the OS:

  * the environment variable name provides:
    - the protocol name ("http" or "HTTPS", aka plain-text or encrypted)     - the expected protocol syntax/semantics ("proxy" aka forward-proxy)

 * the machine /etc/services configuration provides the default port for the named protocol.

Ergo the use of /default/ values when values are not specified.

The defaults though are tuned for origin server (or reverse-proxy) direct contact. No Browser I know supports "http-alt://proxy.example.com?http://origin.example.net/index.html"; URLs.



I feel like this in a round about way supports my stance that the default ports are perfectly fine to use.


... "at your own risk" they technically might be. So long as you only receive one of the three types of syntax there - port 80/443 being officially registered for origin / reverse-proxy syntax.


Attempting to use a reverse-proxy or origin server such a configuration may work for some messages, but **will** fail due to syntax or semantic errors on others.

I question the veracity of that statement.


It is based on experience. Squid used to be a lot more lenient and tried for decades to do the syntax auto-detection. The path from that to separate ports is littered with CVEs. Most notably the curse that keeps on giving: CVE-2009-0801, which is just the trigger issue for a whole nest of bad side effects.


Amos
_______________________________________________
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users

Reply via email to