Hi Alex, On Mon, Aug 12, 2024 at 11:46:37AM +0200, Aleksandar Lazic wrote: > > On Thu, Jun 13, 2024 at 03:00:59AM +0200, Aleksandar Lazic wrote: > > > > The final idea is something like this. > > > > > > > > ``` > > > > tcp-request content upstream-proxy-header Host %[req.ssl_sni,lower] > > > > tcp-request content upstream-proxy-header "$AUTH" "$TOKEN" > > > > tcp-request content upstream-proxy-header Proxy-Connection Keep-Alive > > > > tcp-request content upstream-proxy-target %[req.ssl_sni,lower] > > > > > > > > server stream 0.0.0.0:0 upstream-proxy-tunnel > > > > %[req.ssl_sni,lower,map_str(targets.map)] > > > > ``` > > > > > > > > The targets.map should have something like this. > > > > #dest proxy > > > > sni01 proxy01 > > > > sni02 proxy02 > > > > > > > > I hope the background of upstream-proxy-target is now more clear. > > > > > > > > > - In `server https_Via_Proxy1 0.0.0.0:0 upstream-proxy-tunnel > > > > > 127.0.0.1:3128` > > > > > from your config, what is 0.0.0.0:0 used for here? This binds to > > > > > all IPv4 > > > > > but on a random free port? > > > > > > > > This is required when the destination should be dynamic, it's > > > > documented here. > > > > http://docs.haproxy.org/3.0/configuration.html#4.4-do-resolve > > > > > > > > > A+ > > > > > Dave > > > > > > > > Regards > > > > Alex > > > > Overall I find that there are plenty of good points in this patch and > > the discussion around it. But it also raises questions: > > > > - IMHO the server's address should definitely be the proxy's address. > > This will permit to use all the standard mechanisms we have such as > > DNS resolution and health checks and continues to work as a regular > > handshake. I'm not sure this is the case in the current state of the > > patch since I'm seeing an extra "upstream-proxy-tunnel" parameter > > (actually I'm confused by what you call a "target" here, as used in > > upstream-proxy-target). > > Well I thought the same but that's then difficult to separate between the > proxy server and the destination server. Maybe we need some kind of server > type? > > As you can see in the sequence diagram are both servers from HAProxy point of > view. > https://github.com/git001/tls-proxy-tunnel/blob/main/README.md#sequence-diagram > > The "target" is the "destination server" which is from my point of view the > "server" in haproxy. > > The "upstream-proxy-target" is the upstream proxy server. I'm also not happy > with that name and open for suggestions.
Then I'm confused by what the server's configured address becomes. I mean, there are three possibilities: 1) the ip:port of the server designates the target server, in which case only ip:port are passed to the proxy and we never send an FQDN there. Then the proxy has to be hard-coded in an option (apparently the upstream-proxy-tunnel here). Haproxy then focuses only on the target server for load balancing, stickiness, health checks etc and always reaches that server through the hard-coded upstream proxy. This could match what an application-oriente LB would look for, i.e. being able to reach its "local" servers through a proxy. It's just that I would be surprised that such a use case really exists, because load-balancing remote servers across a single local hard-coded proxy doesn't feel natural at all. E.g. the LB is not even on the LAN that knows all the remote hosts so such a config is particularly difficult to maintain in a working state. 2) the fqdn:port of the server designates the target server, in which case only fqdn:port are passed to the proxy and we let the proxy resolve the target server. The rest is the same as above, it's just a small variation, but I'm still having a hard time figuring what use case that could match. 3) the ip:port of the server designates the upstream proxy, and the connection's destination is passed to the CONNECT method. In this case it means that LB, checks, stickiness etc applies to the proxies and not to the servers. In this case we need a distinct field to set the target host name. That's more or less what you did with the tcp-request upstream-proxy-target rule, which could possible have a default on the server line. We could also imagine that a set-dst action would allow to force the target destination address, but that would be a detail. This more is more about infrastructure than remote app servers: you tell the proxy how to reach external services. I've seen this requested a few times by those saying "I would like to use the proxy protocol from my servers, that would connect to haproxy which would then connect to the forward proxy to reach the destination". The choice might be debatable of course but I think it does cover a number of situations where haproxy and the forward proxy farm are here to provide access to a subset of the internet for a bunch of local servers. Note that in all cases, the server's address is used and designates something different. What's important is to stop thinking about the single-node end-to-end chain because that's the best way to make a mistake. We need to think that we're on a load balancer and what we want to load-balance: remote servers or local proxies. I tend to think the latter. Similarly, in my mind it makes sense to think that if resolvers are enabled on the server, they resolve to the IP of the forward proxy, so that you can have, for example, forward proxies as part of your cloud deployment, and learn about them dynamically. I think we really need to know what use case we're trying to address, it's very important to know it. > > - you should definitely get rid of allthese DPRINTF() debug traces. > > The simple fact that I no longer know how to enable them should ring > > a bell about their relevance ;-) > > :-) > > What's the suggested alternative? > I thought "*TRACE*" but haven't seen any good documentation how to use it. > I'm happy to switch to any better debugging setup. The traces are not trivial to set up but not terribly difficult either. They work by subsystem which we call a "trace source". That's what you see in the CLI's "trace" command (e.g.: h1, h2, stream, quic etc). This requires the definition of a TRACE_SOURCE macro pointing to a local definition in the file that is touched. This is entirely for a source file, so that traces are easy to use. From there, TRACE_ENTER() and TRACE_LEAVE() in each function are sufficient to know where you're entering/leaving. TRACE_PRINTF() allows to perform printf-like traces when at devel level (they're a bit expensive since using snprintf but that's OK here). Then there are other levels depending on the level of info wanted (e.g. TRACE_USER for traces susceptible to be of interest for the end user, such as "connected to X"), TRACE_STATE for anything related to internal state changes (switching to connection pending, then connected, then closed) etc. We don't have traces at the connection level since this is mostly utility. Maybe we could consider having specific traces for handshakes in general (CONNECT, PP, etc), which could possibly help. But then it would mean moving most of the handshake-specific code to a new file (e.g. conn-handshake.c). You can have a look at mux-h1.c or mux-h2.c to see an example of how these are used. > > > - we need to make sure this handshake is sent before SSL and not after. > > This should permit to do what users normally expect, i.e., encrypt a > > TCP connection and encapsulate it into a CONNECT request. > > I thought that's how I added it, looks like I have overseen something. > From my understanding is the call in "xprt_handshake_io_cb(...)" before the > TLS/SSL handshake, is this wrong? At first glance, seeing xprt_handshake_io_cb(), I think this is OK. But it's important to test it. We could discover really nasty surprises. > > - please don't realign the srv_kw_list at the end of the patch, it makes > > it impossible to spot the addition(s) there. We can tolerate a few > > unaligned entries (even though often it's a sign that a shorter name > > would be welcome), and if you really need to realign, please do that > > in a separate patch. > > Okay. I'm happy to discuss a better name. I don't have any for now (I'm not good at giving names, I think the project itself shows it). However I think we should make "http1" appear in the name so that there's no ambiguity about the compatibility with other solutions (e.g. "use-http1-proxy" or "use-http1-connect" etc might be explicit enough, though maybe looking at how other utilities call this could help). > > - and BTW there's no need for this "upt" entry in the proxy struct, if > > a string is needed you can just use an ist or even char* which are > > simpler. But I do think that it should be extracted from the request > > (something we need to figure how by default). > > In general, is the "struct proxy{ { ... }tcp_req; }" the right one to add > this feature? > Maybe it's better to add it into the l4 or l5 ruleset? I still don't understand how it's going to be used at the proxy level. I thought it would be a list of rules but actually that's not the case since the rules are in tcp-request. I just don't understand where it is configured. > I haven't seen any good way to define a server type like "upstream", > "origin", "any-other-type" for the server line maybe this type keyword could > help to get the separation between an upstream proxy server and an > origin/target/destination server. Maybe. I was thinking that a way to distinguish between cases 1 and 3 above is actually the transparent mode: you want to route your connection to the origin via the proxy. But there's still something that feels odd in this, and that's also why I think we'd all see clearer after the real world use cases are enumerated (regardless of the config, the config will come out of these, trying to look the most intuitive possible to cover these cases). Willy