I wonder if we have considered simply removing the statement "There is no
ordering defined on endpoints. Hence, if the returned data has an ordering,
it should be returned in a single endpoint." and  replacing it with
something that says "the relative ordering of data from different endpoints
is implementation defined"

I am struggling to come up with a concrete usecase for the "ordered" flag.

The ticket references "distributed sort" but most distributed sort
algorithms I know of would produce multiple sorted streams that need to be
merged together. For example

Endpoint 1: (B, C, D)
Endpoint 2: (A, E, F)

It is not clear how the "ordered" flag would help here

If the intent is somehow to signal the client it doesn't have to merge
(e.g. with data like)

Endpoint 1: (A, B, C)
Endpoint 2:  (D, E, F)

This seems of very limited value if, for example, if the user desired DESC
order, then the endpoint would return

Endpoint 1: (C, B, A)
Endpoint 2: (F, E, D)

Which doesn't seem to conform to the updated definition

Andrew


On Tue, Apr 25, 2023 at 8:56 PM Sutou Kouhei <k...@clear-code.com> wrote:

> Hi,
>
> I would like to propose adding support for ordered data to
> Apache Arrow Flight. If anyone has comments for this
> proposal, please share them at here or the issue for this
> proposal: https://github.com/apache/arrow/issues/34852
>
> This is one of proposals in "[DISCUSS] Flight RPC/Flight
> SQL/ADBC enhancements":
>
>   https://lists.apache.org/thread/247z3t06mf132nocngc1jkp3oqglz7jp
>
> See also the "Flight RPC: Ordered Data" section in the
> design document for the proposals:
>
>
> https://docs.google.com/document/d/1jhPyPZSOo2iy0LqIJVUs9KWPyFULVFJXTILDfkadx2g/edit#
>
> Background:
>
> Currently, the endpoints within a FlightInfo explicitly have
> no ordering.
>
> This is unnecessarily limiting. Systems can and do implement
> distributed sorts, but they can't reflect this in the
> current specification.
>
> Proposal:
>
> Add a flag to FlightInfo. If the flag is set, the client may
> assume that the data is sorted in the same order as the
> endpoints. Otherwise, the client cannot make any assumptions
> (as before).
>
> This is a compatible change because the client can just
> ignore the flag.
>
> Implementation:
>
> https://github.com/apache/arrow/pull/35178 is an
> implementation of this proposal. The pull requests has the
> followings:
>
> 1. Format changes:
>
> https://github.com/apache/arrow/pull/35178/files#diff-53b6c132dcc789483c879f667a1c675792b77aae9a056b257d6b20287bb09dba
>    * format/Flight.proto
>
> 2. Documentation changes:
>
> https://github.com/apache/arrow/pull/35178/files#diff-839518fb41e923de682e8587f0b6fdb00eb8f3361d360c2f7249284a136a7d89
>    * docs/source/format/Flight.rst
>
> 3. The C++ implementation and an integration test:
>    * cpp/src/arrow/flight/
>
> 4. The Java implementation and an integration test (thanks to David Li!):
>    * java/flight/
>
> 5. The Go implementation and an integration test:
>    * go/arrow/flight/
>    * go/arrow/internal/flight_integration/
>
> Next:
>
> I'll start a vote for this proposal after we reach a consensus
> on this proposal.
>
> It's the standard process for format change.
> See also:
>
> * [VOTE] Formalize how to change format
>   https://lists.apache.org/thread/jlc4wtt09rfszlzqdl55vrc4dxzscr4c
> * GH-35084: [Docs][Format] Add how to change format specification
>   https://github.com/apache/arrow/pull/35174
>
>
> Thanks,
> --
> kou
>

Reply via email to