Re: [PATCH] FEATURE/MAJOR: Add upstream-proxy-tunnel feature

Willy Tarreau Mon, 12 Aug 2024 07:50:35 -0700

Hi Alex,

On Mon, Aug 12, 2024 at 11:46:37AM +0200, Aleksandar Lazic wrote:
> > On Thu, Jun 13, 2024 at 03:00:59AM +0200, Aleksandar Lazic wrote:
> > > > The final idea is something like this.
> > > > 
> > > > ```
> > > > tcp-request content upstream-proxy-header Host %[req.ssl_sni,lower]
> > > > tcp-request content upstream-proxy-header "$AUTH" "$TOKEN"
> > > > tcp-request content upstream-proxy-header Proxy-Connection Keep-Alive
> > > > tcp-request content upstream-proxy-target %[req.ssl_sni,lower]
> > > > 
> > > > server stream 0.0.0.0:0 upstream-proxy-tunnel
> > > > %[req.ssl_sni,lower,map_str(targets.map)]
> > > > ```
> > > > 
> > > > The targets.map should have something like this.
> > > > #dest proxy
> > > > sni01 proxy01
> > > > sni02 proxy02
> > > > 
> > > > I hope the background of upstream-proxy-target is now more clear.
> > > > 
> > > > > - In `server https_Via_Proxy1 0.0.0.0:0 upstream-proxy-tunnel 
> > > > > 127.0.0.1:3128`
> > > > >     from your config, what is 0.0.0.0:0 used for here? This binds to 
> > > > > all IPv4
> > > > >     but on a random free port?
> > > > 
> > > > This is required when the destination should be dynamic, it's 
> > > > documented here.
> > > > http://docs.haproxy.org/3.0/configuration.html#4.4-do-resolve
> > > > 
> > > > > A+
> > > > > Dave
> > > > 
> > > > Regards
> > > > Alex
> > 
> > Overall I find that there are plenty of good points in this patch and
> > the discussion around it. But it also raises questions:
> > 
> >    - IMHO the server's address should definitely be the proxy's address.
> >      This will permit to use all the standard mechanisms we have such as
> >      DNS resolution and health checks and continues to work as a regular
> >      handshake. I'm not sure this is the case in the current state of the
> >      patch since I'm seeing an extra "upstream-proxy-tunnel" parameter
> >      (actually I'm confused by what you call a "target" here, as used in
> >      upstream-proxy-target).
> 
> Well I thought the same but that's then difficult to separate between the
> proxy server and the destination server. Maybe we need some kind of server
> type?
> 
> As you can see in the sequence diagram are both servers from HAProxy point of 
> view.
> https://github.com/git001/tls-proxy-tunnel/blob/main/README.md#sequence-diagram
> 
> The "target" is the "destination server" which is from my point of view the
> "server" in haproxy.
> 
> The "upstream-proxy-target" is the upstream proxy server. I'm also not happy
> with that name and open for suggestions.


Then I'm confused by what the server's configured address becomes.

I mean, there are three possibilities:

  1) the ip:port of the server designates the target server, in which
     case only ip:port are passed to the proxy and we never send an FQDN
     there. Then the proxy has to be hard-coded in an option (apparently
     the upstream-proxy-tunnel here). Haproxy then focuses only on the
     target server for load balancing, stickiness, health checks etc and
     always reaches that server through the hard-coded upstream proxy.
     This could match what an application-oriente LB would look for, i.e.
     being able to reach its "local" servers through a proxy. It's just
     that I would be surprised that such a use case really exists, because
     load-balancing remote servers across a single local hard-coded proxy
     doesn't feel natural at all. E.g. the LB is not even on the LAN that
     knows all the remote hosts so such a config is particularly difficult
     to maintain in a working state.

  2) the fqdn:port of the server designates the target server, in which
     case only fqdn:port are passed to the proxy and we let the proxy
     resolve the target server. The rest is the same as above, it's just
     a small variation, but I'm still having a hard time figuring what
     use case that could match.

  3) the ip:port of the server designates the upstream proxy, and the
     connection's destination is passed to the CONNECT method. In this
     case it means that LB, checks, stickiness etc applies to the proxies
     and not to the servers. In this case we need a distinct field to set
     the target host name. That's more or less what you did with the
     tcp-request upstream-proxy-target rule, which could possible have a
     default on the server line. We could also imagine that a set-dst
     action would allow to force the target destination address, but that
     would be a detail. This more is more about infrastructure than remote
     app servers: you tell the proxy how to reach external services. I've
     seen this requested a few times by those saying "I would like to use
     the proxy protocol from my servers, that would connect to haproxy
     which would then connect to the forward proxy to reach the
     destination". The choice might be debatable of course but I think it
     does cover a number of situations where haproxy and the forward proxy
     farm are here to provide access to a subset of the internet for a bunch
     of local servers.

Note that in all cases, the server's address is used and designates
something different. What's important is to stop thinking about the
single-node end-to-end chain because that's the best way to make a
mistake. We need to think that we're on a load balancer and what we
want to load-balance: remote servers or local proxies. I tend to think
the latter.

Similarly, in my mind it makes sense to think that if resolvers are
enabled on the server, they resolve to the IP of the forward proxy, so
that you can have, for example, forward proxies as part of your cloud
deployment, and learn about them dynamically.

I think we really need to know what use case we're trying to address,
it's very important to know it.

> >    - you should definitely get rid of allthese DPRINTF() debug traces.
> >      The simple fact that I no longer know how to enable them should ring
> >      a bell about their relevance ;-)
> 
> :-)
> 
> What's the suggested alternative?
> I thought "*TRACE*" but haven't seen any good documentation how to use it.
> I'm happy to switch to any better debugging setup.

The traces are not trivial to set up but not terribly difficult either. They
work by subsystem which we call a "trace source". That's what you see in the
CLI's "trace" command (e.g.: h1, h2, stream, quic etc). This requires the
definition of a TRACE_SOURCE macro pointing to a local definition in the
file that is touched. This is entirely for a source file, so that traces are
easy to use. From there, TRACE_ENTER() and TRACE_LEAVE() in each function
are sufficient to know where you're entering/leaving. TRACE_PRINTF() allows
to perform printf-like traces when at devel level (they're a bit expensive
since using snprintf but that's OK here). Then there are other levels
depending on the level of info wanted (e.g. TRACE_USER for traces susceptible
to be of interest for the end user, such as "connected to X"), TRACE_STATE
for anything related to internal state changes (switching to connection
pending, then connected, then closed) etc.

We don't have traces at the connection level since this is mostly utility.
Maybe we could consider having specific traces for handshakes in general
(CONNECT, PP, etc), which could possibly help. But then it would mean
moving most of the handshake-specific code to a new file (e.g.
conn-handshake.c).

You can have a look at mux-h1.c or mux-h2.c to see an example of how these
are used.

> 
> >    - we need to make sure this handshake is sent before SSL and not after.
> >      This should permit to do what users normally expect, i.e., encrypt a
> >      TCP connection and encapsulate it into a CONNECT request.
> 
> I thought that's how I added it, looks like I have overseen something.
> From my understanding is the call in "xprt_handshake_io_cb(...)" before the
> TLS/SSL handshake, is this wrong?

At first glance, seeing xprt_handshake_io_cb(), I think this is OK. But
it's important to test it. We could discover really nasty surprises.

> >    - please don't realign the srv_kw_list at the end of the patch, it makes
> >      it impossible to spot the addition(s) there. We can tolerate a few
> >      unaligned entries (even though often it's a sign that a shorter name
> >      would be welcome), and if you really need to realign, please do that
> >      in a separate patch.
> 
> Okay. I'm happy to discuss a better name.

I don't have any for now (I'm not good at giving names, I think the
project itself shows it). However I think we should make "http1" appear
in the name so that there's no ambiguity about the compatibility with
other solutions (e.g. "use-http1-proxy" or "use-http1-connect" etc might
be explicit enough, though maybe looking at how other utilities call this
could help).

> >    - and BTW there's no need for this "upt" entry in the proxy struct, if
> >      a string is needed you can just use an ist or even char* which are
> >      simpler. But I do think that it should be extracted from the request
> >      (something we need to figure how by default).
> 
> In general, is the "struct proxy{ { ... }tcp_req; }" the right one to add
> this feature?
> Maybe it's better to add it into the l4 or l5 ruleset?

I still don't understand how it's going to be used at the proxy level.
I thought it would be a list of rules but actually that's not the case
since the rules are in tcp-request. I just don't understand where it is
configured.

> I haven't seen any good way to define a server type like "upstream",
> "origin", "any-other-type" for the server line maybe this type keyword could
> help to get the separation between an upstream proxy server and an
> origin/target/destination server.

Maybe. I was thinking that a way to distinguish between cases 1 and 3
above is actually the transparent mode: you want to route your connection
to the origin via the proxy. But there's still something that feels odd
in this, and that's also why I think we'd all see clearer after the real
world use cases are enumerated (regardless of the config, the config will
come out of these, trying to look the most intuitive possible to cover
these cases).

Willy

Re: [PATCH] FEATURE/MAJOR: Add upstream-proxy-tunnel feature

Reply via email to