I'm going to keep the proposal as-is then. It can be extended if this use case 
comes up.

I'll start work on candidate implementations now.

On Tue, Feb 13, 2024, at 03:22, Antoine Pitrou wrote:
> I think the original proposal is sufficient.
>
> Also, it is not obvious to me how one would switch from e.g. grpc+tls to 
> http without an explicit server location (unless both Flight servers are 
> hosted under the same port?). So the "+" proposal seems a bit weird.
>
>
> Le 12/02/2024 à 23:39, David Li a écrit :
>> The idea is that the client would reuse the existing connection, in which 
>> case the protocol and such are implicit. (If the client doesn't have a 
>> connection anymore, it can't use the fallback anyways.)
>> 
>> I suppose this has the advantage that you could "fall back" to a known 
>> hostname with a different protocol, but I'm not sure that always applies 
>> anyways. (Correct me if I'm wrong Matt, but as I recall, UCX addresses 
>> aren't hostnames but rather opaque byte blobs, for instance.)
>> 
>> If we do prefer this, to avoid overloading the hostname, there's also the 
>> informal convention of using + in the scheme, so it could be 
>> arrow-flight-fallback+grpc+tls://, arrow-flight-fallback+http://, etc.
>> 
>> On Mon, Feb 12, 2024, at 17:03, Joel Lubinitsky wrote:
>>> Thanks for clarifying.
>>>
>>> Given the relationship between these two proposals, would it also be
>>> necessary to distinguish the scheme (or schemes) supported by the
>>> originating Flight RPC service?
>>>
>>> If that is the case, it may be preferred to use the "host" portion of the
>>> URI rather than the "scheme" to denote the location of the data. In this
>>> scenario, the host "0.0.0.0" could be used. This IP address is defined in
>>> IETF RFC1122 [1] as "This host on this network", which seems most
>>> consistent with the intended use-case. There are some caveats to this usage
>>> but in my experience it's not uncommon for protocols to extend the
>>> definition of this address in their own usage.
>>>
>>> A benefit of this convention is that the scheme remains available in the
>>> URI to specify the transport available. For example, the following list of
>>> locations may be included in the response:
>>>
>>> ["grpc://0.0.0.0", "ucx://0.0.0.0", "grpc://1.2.3.4", <other_locations>...]
>>>
>>> This would indicate that grpc and ucx transport is available from the
>>> current service, grpc is available at 1.2.3.4, and possibly more
>>> combinations of scheme/host.
>>>
>>> [1] https://datatracker.ietf.org/doc/html/rfc1122#section-3.2.1.3
>>>
>>> On Mon, Feb 12, 2024 at 2:53 PM David Li <lidav...@apache.org> wrote:
>>>
>>>> Ah, while I was thinking of it as useful for a fallback, I'm not
>>>> specifying it that way.  Better ideas for names would be appreciated.
>>>>
>>>> The actual precedence has never been specified. All endpoints are
>>>> equivalent, so clients may use what is "best". For instance, with Matt
>>>> Topol's concurrent proposal, a GPU-enabled client may preferentially try
>>>> UCX endpoints while other clients may choose to ignore them entirely (e.g.
>>>> because they don't have UCX installed).
>>>>
>>>> In practice the ADBC/JDBC drivers just scan the list left to right and try
>>>> each endpoint in turn for lack of a better heuristic.
>>>>
>>>> On Mon, Feb 12, 2024, at 14:28, Joel Lubinitsky wrote:
>>>>> Thanks for proposing this David.
>>>>>
>>>>> I think the ability to include the Flight RPC service itself in the list
>>>> of
>>>>> endpoints from which data can be fetched is a helpful addition.
>>>>>
>>>>> The current choice of name for the URI (arrow-flight-fallback://) seems
>>>> to
>>>>> imply that there is an order of precedence that should be considered in
>>>> the
>>>>> list of URI’s. Specifically, as a developer receiving the list of
>>>> locations
>>>>> I might assume that I should try fetching from other locations first. If
>>>>> those do not succeed, I may try the original service as a fallback.
>>>>>
>>>>> Are these the intended semantics? If so, is there a way to include the
>>>>> original service in the list of locations without the implied precedence?
>>>>>
>>>>> Thanks,
>>>>> Joel
>>>>>
>>>>> On Mon, Feb 12, 2024 at 11:52 James Duong <james.du...@improving.com
>>>> .invalid>
>>>>> wrote:
>>>>>
>>>>>> This seems like a good idea, and also improves consistency with clients
>>>>>> that erroneously assumed that the service endpoint was always in the
>>>> list
>>>>>> of endpoints.
>>>>>>
>>>>>> From: Antoine Pitrou <anto...@python.org>
>>>>>> Date: Monday, February 12, 2024 at 6:05 AM
>>>>>> To: dev@arrow.apache.org <dev@arrow.apache.org>
>>>>>> Subject: Re: [DISCUSS] Flight RPC: add 'fallback' URI scheme
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> This looks fine to me.
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> Antoine.
>>>>>>
>>>>>>
>>>>>> Le 12/02/2024 à 14:46, David Li a écrit :
>>>>>>> Hello,
>>>>>>>
>>>>>>> I'd like to propose a slight update to Flight RPC to make Flight SQL
>>>>>> work better in different deployment scenarios.  Comments on the doc
>>>> would
>>>>>> be appreciated:
>>>>>>>
>>>>>>>
>>>>>>
>>>> https://docs.google.com/document/d/1g9M9FmsZhkewlT1mLibuceQO8ugI0-fqumVAXKFjVGg/edit?usp=sharing
>>>>>>>
>>>>>>> The gist is that FlightEndpoint allows specifying either (1) a list of
>>>>>> concrete URIs to fetch data from or (2) no URIs, meaning to fetch from
>>>> the
>>>>>> Flight RPC service itself; but it would be useful to combine both
>>>> behaviors
>>>>>> (try these concrete URIs and fall back to the Flight RPC service itself)
>>>>>> without requiring the service to know its own public address.
>>>>>>>
>>>>>>> Best,
>>>>>>> David
>>>>>>
>>>>

Reply via email to