The server would have to report these as multiple endpoints in all your 
examples. (There's nothing saying a particular location can only appear once, 
or that "Endpoint 2" has to come after "Endpoint 1" for the DESC example.)

The flag tells the client if it can fetch data in parallel without regard to 
order or if it should make sure to preserve the sorting of the data. (The ADBC 
Flight SQL clients in Go, C++, etc. already had to deal with this.) For 
instance Acero may care because certain plan nodes require some sort of 
ordering to be present; knowing a Flight datasource has this ordering could 
then save having to insert a sort operation into the plan.

"Implementation defined" I think would basically devolve to clients always 
making the conservative/inefficient choice, like the Go ADBC driver always 
preserving order out of concern for compatibility and Acero always sorting data 
to use order-dependent nodes.

On Thu, Apr 27, 2023, at 23:55, Andrew Lamb wrote:
> I wonder if we have considered simply removing the statement "There is no
> ordering defined on endpoints. Hence, if the returned data has an ordering,
> it should be returned in a single endpoint." and  replacing it with
> something that says "the relative ordering of data from different endpoints
> is implementation defined"
>
> I am struggling to come up with a concrete usecase for the "ordered" flag.
>
> The ticket references "distributed sort" but most distributed sort
> algorithms I know of would produce multiple sorted streams that need to be
> merged together. For example
>
> Endpoint 1: (B, C, D)
> Endpoint 2: (A, E, F)
>
> It is not clear how the "ordered" flag would help here
>
> If the intent is somehow to signal the client it doesn't have to merge
> (e.g. with data like)
>
> Endpoint 1: (A, B, C)
> Endpoint 2:  (D, E, F)
>
> This seems of very limited value if, for example, if the user desired DESC
> order, then the endpoint would return
>
> Endpoint 1: (C, B, A)
> Endpoint 2: (F, E, D)
>
> Which doesn't seem to conform to the updated definition
>
> Andrew
>
>
> On Tue, Apr 25, 2023 at 8:56 PM Sutou Kouhei <k...@clear-code.com> wrote:
>
>> Hi,
>>
>> I would like to propose adding support for ordered data to
>> Apache Arrow Flight. If anyone has comments for this
>> proposal, please share them at here or the issue for this
>> proposal: https://github.com/apache/arrow/issues/34852
>>
>> This is one of proposals in "[DISCUSS] Flight RPC/Flight
>> SQL/ADBC enhancements":
>>
>>   https://lists.apache.org/thread/247z3t06mf132nocngc1jkp3oqglz7jp
>>
>> See also the "Flight RPC: Ordered Data" section in the
>> design document for the proposals:
>>
>>
>> https://docs.google.com/document/d/1jhPyPZSOo2iy0LqIJVUs9KWPyFULVFJXTILDfkadx2g/edit#
>>
>> Background:
>>
>> Currently, the endpoints within a FlightInfo explicitly have
>> no ordering.
>>
>> This is unnecessarily limiting. Systems can and do implement
>> distributed sorts, but they can't reflect this in the
>> current specification.
>>
>> Proposal:
>>
>> Add a flag to FlightInfo. If the flag is set, the client may
>> assume that the data is sorted in the same order as the
>> endpoints. Otherwise, the client cannot make any assumptions
>> (as before).
>>
>> This is a compatible change because the client can just
>> ignore the flag.
>>
>> Implementation:
>>
>> https://github.com/apache/arrow/pull/35178 is an
>> implementation of this proposal. The pull requests has the
>> followings:
>>
>> 1. Format changes:
>>
>> https://github.com/apache/arrow/pull/35178/files#diff-53b6c132dcc789483c879f667a1c675792b77aae9a056b257d6b20287bb09dba
>>    * format/Flight.proto
>>
>> 2. Documentation changes:
>>
>> https://github.com/apache/arrow/pull/35178/files#diff-839518fb41e923de682e8587f0b6fdb00eb8f3361d360c2f7249284a136a7d89
>>    * docs/source/format/Flight.rst
>>
>> 3. The C++ implementation and an integration test:
>>    * cpp/src/arrow/flight/
>>
>> 4. The Java implementation and an integration test (thanks to David Li!):
>>    * java/flight/
>>
>> 5. The Go implementation and an integration test:
>>    * go/arrow/flight/
>>    * go/arrow/internal/flight_integration/
>>
>> Next:
>>
>> I'll start a vote for this proposal after we reach a consensus
>> on this proposal.
>>
>> It's the standard process for format change.
>> See also:
>>
>> * [VOTE] Formalize how to change format
>>   https://lists.apache.org/thread/jlc4wtt09rfszlzqdl55vrc4dxzscr4c
>> * GH-35084: [Docs][Format] Add how to change format specification
>>   https://github.com/apache/arrow/pull/35174
>>
>>
>> Thanks,
>> --
>> kou
>>

Reply via email to