Re: [grpc-io] Re: gRFC A1: HTTP CONNECT proxy support

'Mark D. Roth' via grpc.io Thu, 26 Jan 2017 08:43:09 -0800

Replies inline.  Please let me know if you want to chat about any of this
in person.


On Wed, Jan 25, 2017 at 4:13 PM, 'Eric Anderson' via grpc.io <
grpc-io@googlegroups.com> wrote:

> On Wed, Jan 25, 2017 at 2:00 PM, Mark D. Roth <r...@google.com> wrote:
>
>> On Wed, Jan 25, 2017 at 12:59 PM, 'Eric Anderson' via grpc.io <
>> grpc-io@googlegroups.com> wrote:
>>
>>> The Java implementation is going to have hurdles. I sort of expect
>>> issues adhering to the design as precisely as it is defined. I've got to
>>> figure out where ProxySelector
>>> <http://docs.oracle.com/javase/8/docs/api/java/net/ProxySelector.html>
>>> fits into all of this.
>>>
>>
>> Is this just the concern you've mentioned about how authentication fits
>> in, or is there something more here?
>>
>
> It wasn't an auth issue. It's more of an issue of needing to work with
> pre-existing APIs and expectations. We will, for example, be able to
> support a mixed CONNECT usage in case 1. I wouldn't be surprised if you
> need to eventually as well. That is what wpad.dat
> <https://en.wikipedia.org/wiki/Web_Proxy_Auto-Discovery_Protocol> solves,
> after all.
>

I think that if/when we need to do that, it should be possible to add the
logic in the same place where we currently look at the environment
variable, so it's not clear to me that there are any design changes needed
to leave room for this possibility.  Do you agree?  If not, can you point
out what parts of the design may cause problems for this, and possibly
suggest alternatives?


>
> Any references to "load balancing" should say "client-side load balancing".
>>>
>>
>> I think that "client-side load balancing" is misleading when talking
>> about grpclb, since the actual load balancing code happens on the balancers
>> instead of on the client.  But I do take your point here.  I've changed it
>> to use the term "per-call load balancing".
>>
>
> Moving discussion to PR.
>
>    - *All* requests must go through the proxy, both for internal and
>>>> external servers.
>>>
>>>
>>> This is not true. It only applies to external servers. It directly
>>> contradicts the earlier "outbound to the internet." I could maybe agree
>>> with it if it said "may" instead of "must."
>>>
>>
>> My understanding is that if the http_proxy environment variable is set,
>> then the proxy is used unconditionally for all servers, so I think this is
>> accurate.  I've updated the wording in the description of this case to make
>> it clear that this is not just for outbound traffic.
>>
>
> That's conflating two things: the environment and the configuration. Your
> description of the environment is not true. When in this environment we
> expect the http_proxy environment variable as the form of configuration,
> but that has no impact on how the environment actually behaves.
>

What we actually care about here is the configuration, which is that all
connections go through the proxy.  It may be the case that this
configuration was created as the simplest way to address a policy that said
that only outbound connections must go through the proxy, but our code
doesn't actually care about that; it just cares about what configuration we
need to support.


>
> In this case, there will be an external name service record for the server
>>>> name that points to the IP address of the proxy and has the `is_balancer`
>>>> bit set.  (Note: We have not yet designed how that bit will be encoded in
>>>> DNS, but that will be the subject of a separate gRFC.) The proxy mapper
>>>> implementation will then have to detect two types of addresses:
>>>
>>> - When it sees the proxy address, it will set the HTTP CONNECT argument
>>>> to the original server name.
>>>
>>>
>>> Eww... So it actually changes the end-host. Note that ability is not
>>> described earlier in the doc when proxy mapper is introduced. Nor is it
>>> made clear here it is an additional feature.
>>>
>>
>> I'm not sure that I completely understand this comment.
>>
>> It's definitely required that we be able to set the argument of the HTTP
>> CONNECT request.  Even without case 3, we needed that anyway, because in
>> case 1 we want to use the server name and have the proxy do the name
>> resolution for us, but in case 2 we want to use the IP addresses of the
>> individual backends.
>>
>
> I'm not certain there's a fundamental need for special behavior between
> case 1 and 2 concerning the CONNECT string, but in any case, I don't see
> why the *proxy mapper* must do it.
>

Can you say more about why you think we could use the same CONNECT argument
in both cases 1 and 2?  I don't see how this could work.  Using the IP
address won't work in case 1, because the client can't do name resolution,
so we don't actually know the IP address to use.  And using the hostname
won't work in case 2, because the client *has* done the name resolution in
this case, and it needs to create a separate subchannel for each server
address in order for LB policies like round_robin to work.  If we use the
hostname in the CONNECT request, then we have no guarantee that each
subchannel will wind up at the right backend.  That's why I think we need a
mechanism to specify what the CONNECT argument is in different cases.

Case 2 is triggered from the proxy mapper, which is why the proxy mapper
needs to use this mechanism.  In addition, this allows the proxy mapper to
set the CONNECT argument differently for the different situations in case
3.  And more generally, I also think it's a more flexible approach that may
allow users to write proxy mappers in the future to do things that we're
not thinking of right now.


>
> I'd expect the proxy mapper to return one of two things:
>  - no proxy needed
>  - use CONNECT with proxy IP x.x.x.x
>
> That gives the mapper the control it needs without opening the ability to
> do outrageous things.
>
> I think "when it sees the proxy address" also has fundamental issues, like
> requiring the proxy to have a hard-coded stable IP. That means you couldn't
> add a new proxy to the rotation if experiencing too much load.
>
> More likely, in your scheme, I'd expect the "proxy address" to become 100%
> fake. "Oh! It's 1.1.1.1! That's our secret code for proxy address."
>

We discussed the possibility of using a sentinel address value like this,
but I think that's really ugly.  Using the proxy address seems cleaner,
especially since the client needs to know what proxy address to use anyway
in order to return that value from the proxy mapper.


>
> Why isn't the LB just made public? It can be behind some other type of
>>> load balancer. That's what I had expected when discussing earlier. Yes,
>>> that means there are more auth hurdles, but it seems more sound.
>>>
>>
>> If I'm understanding you right, that is essentially what is being
>> proposed here.  The idea is that the grpclb balancer is accessed via the
>> HTTP CONNECT proxy.
>>
>
> No. I'm proposing that case 3 is the same as case 2, but with a different
> server configuration.
>
> Case 1 would use the hostname in CONNECT. Case 2 would use IP in CONNECT.
>
> If I want Case 2, but don't want to expose internal IP addresses to
> unauthenticated clients, I'd just make GRPCLB public and connect to it
> directly, without CONNECT. DNS returns public IPs, and the GRPCLB
> communication can be authenticated.
>

I think Julien addressed this in his reply.  One of the requirements of
case 2 is that the grpclb balancers are inside of the protected environment.


> --
> You received this message because you are subscribed to the Google Groups "
> grpc.io" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to grpc-io+unsubscr...@googlegroups.com.
> To post to this group, send email to grpc-io@googlegroups.com.
> Visit this group at https://groups.google.com/group/grpc-io.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/grpc-io/CA%2B4M1oN%3DqjqQx9sXWEmwRYjTyw1Z3ZLMfLa%
> 2B_NFRLFof8Jgnxw%40mail.gmail.com
> <https://groups.google.com/d/msgid/grpc-io/CA%2B4M1oN%3DqjqQx9sXWEmwRYjTyw1Z3ZLMfLa%2B_NFRLFof8Jgnxw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Mark D. Roth <r...@google.com>
Software Engineer
Google, Inc.

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To post to this group, send email to grpc-io@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/CAJgPXp4criJDwM2B%2BcTgQD5-a8Kk8e27t3xfC73JiBbEF__1%3DQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [grpc-io] Re: gRFC A1: HTTP CONNECT proxy support

Reply via email to