Le 19/09/2024 à 13:04, Aleksandar Lazic a écrit :
> Hi Willy.
>
> any chance to look into the Answer?
>

Hi Alex,

I've started the review. First of all I agree with Willy. I guess it is not what
many people have in mind when they are talking about supporting configuration of
an HTTP proxy to connect to a server. But, by explaining it as a "sock4" like
mode, it makes sense. So we must just take care to be clear on this point to
avoid any confusion. And be sure it will still be possible to implement the
other mode without conflict. At first glance, it seems good this way. We must
find the right name for the server line keyword. But it is not a big deal. We
can work on the implementation in meantime.

On the form, I know it is still a work in-progress, but a big patch is still
harder to read than a series of smaller modifications. In addition, there are
still plenty of DPRINTF that make the review more difficult. And some parts of
the code are commented and must not be included in the patch, still for
readability. If this concerns something you want to implement but you need some
help to understand how to do it, it is better to add a comment to explain what
you want to achieve. It is more understandable for the reviewer and it is a
better way to expose your idea.

I suggest to split your patch this way:

  * .If some existing fields must be renamed to support this feature, this may
be done in dedicated patches.

  * Then, a patch must contain the basic feature and its documentation. So the
server line keyword to configure the upstream proxy and the handshake part. It
is the main part of the feature

  * The rules to customize the feature must be move in another patch.

  * Another one with additional ressources, if any (exemples, diagrams...)

  * A final patch with a reg-test if possible. I don't know if the feature can
be tested with VTEST. I guess it is possible to do basic tests.

  * If some traces must be added on handshakes, it must be done in a dedicated
patch.

Of course you can ask some help. It is not trivial. But the idea is to cut as
far as possible the feature to smaller and consistant part. If it is too huge
and it cannot be split, there is probably (but not necessarily) an issue with
the feature.

Now, on the implementation itself. First, from my point of view, the feature
should work without any rules. So, setting the right keyword on the server line
should be good enough to make it works. If the server has a hostname, it should
be used indeed. But otherwise its address must be used. There is not reason to
make it fails in that case. I guess "sock4_addr" and
"upstream_proxy_tunnel_addr", in the server structure, can be merged into one
field. It seems reasonable to make these both handshakes mutually-exclusive. An
error can even be triggered during the configuration parsing if both keyword are
used on the same server line.

You must also take care to send a valid request, with a Host header. Its value
must be set accordingly to the authority. The response parsing must also be
reviewed to handle any 2xx response and to read all headers. I guess that for
now, we can consume everything till the end of headers (2 x CRLF). Otherwise,
any headers will be considered as part of the response to the client. Here at
most one buffer of data must be allowed and you must have the same construct
than the other handshakes: peeking data waiting for a full response and really
consuming them at the end.

The actions parsing must not be inline. A dedicated parsing function must be
used. And there is no need to add new actions in "act_name" enum. You must set
it to ACT_CUSTOM and use dedicated functions. Arguments for the action must be
stored in "act_rule.arg" union. "act_rule.arg.http may be used or eventually
"act_rule.arg.act". A more annoying point is that you must not see these actions
as part of a dedicated ruleset. These actions must be evaluated exactly where
there are defined, respecting the rules order. It is the expected behavior.
Especially if log-format string or expression are used. It is not implemented
yet but it is expected to be supported. In that case the following may be 
written:

   tcp-request content set-var(txn.dummy) int(1)
   tcp-request content upstream-proxy-header My-header "%[var(txn.dummy)]"
   tcp-request content set-var(txn.dummy) int(2)

Here "My-header: 1" must be sent to the proxy. I guess the "updt_rules" list and
the buffers must be removed from the proxy structure. The result of the rules
evaluation must however be stored somewhere in the stream. Here we must think a
bit about how to do so. It may be a dedicated structure with a target name and a
list of headers. For me, if we do so, this struct must be allocated on demand.
There is no reason to use too many memory for a marginal feature. It can even be
an HTX buffer. Some functions are probably missing in the current HTX api. But
it may be handy. Especially if we consider to add more actions in futures. And
it could be also useful to be able to customize the Host header by hand.

This makes me think to something. Everything in your documentation points to the
fact that TCP frontends and backends must be used. And you've restricted the
dynamic configuration to the "tcp-request content" only. Is there any reason for
such limitation ? If yes, could you explain why It cannot be use with HTTP
frontends/backends.

If there is no special reason, I guess the actions must also be supported in the
http-request ruleset. Otherwise, a specific check must be added during the
startup to trigger an error in HTTP mode.

In any case, we must be careful about the connexion reuse. In TCP, there is no
reuse. Finally, it is the easy case. In HTTP, it is trickier because tunnels are
not designed to be reusable. This point must clearly be evaluated if the HTTP
mode must be supported too.

In the documentation, you've added the error "44" to "fc_err_string" sample
fetch function. It is the error returned during the reverse connect. It must be
moved in a dedicated patch. And the error "46" (about "Upstream http proxy read
error") is missing.

We must think about the health-check too. It is not implemented yet, but it is
probably mandatory. At least a "check-via-..." keyword to use the http proxy for
the healthchecks. And then we must think about how to customize the CONNECT
request in the healthchecks context.

Sorry for this long email. I hope this will help you to move on. Don't hesitate
to ask some help on specific point if it is necessary. It can be discouraging,
but you must not be. It is the expected process. It is not a small feature.
several iterations are totally normal.

About naming, I can suggest "http1-proxy-tunnel" for the server keyword. And
set-http1-proxy-tunnel-header/set-http1-proxy-tunnel-target.  IMHO, It is
important to specify the action performed (set here), to be able to add other
kind of actions (add, del...), even if it is unlikely for now.

I guess, it's all for now 😉

Regards,
--
Christopher Faulet



Reply via email to