I wonder if we have considered simply removing the statement "There is no ordering defined on endpoints. Hence, if the returned data has an ordering, it should be returned in a single endpoint." and replacing it with something that says "the relative ordering of data from different endpoints is implementation defined"
I am struggling to come up with a concrete usecase for the "ordered" flag. The ticket references "distributed sort" but most distributed sort algorithms I know of would produce multiple sorted streams that need to be merged together. For example Endpoint 1: (B, C, D) Endpoint 2: (A, E, F) It is not clear how the "ordered" flag would help here If the intent is somehow to signal the client it doesn't have to merge (e.g. with data like) Endpoint 1: (A, B, C) Endpoint 2: (D, E, F) This seems of very limited value if, for example, if the user desired DESC order, then the endpoint would return Endpoint 1: (C, B, A) Endpoint 2: (F, E, D) Which doesn't seem to conform to the updated definition Andrew On Tue, Apr 25, 2023 at 8:56 PM Sutou Kouhei <k...@clear-code.com> wrote: > Hi, > > I would like to propose adding support for ordered data to > Apache Arrow Flight. If anyone has comments for this > proposal, please share them at here or the issue for this > proposal: https://github.com/apache/arrow/issues/34852 > > This is one of proposals in "[DISCUSS] Flight RPC/Flight > SQL/ADBC enhancements": > > https://lists.apache.org/thread/247z3t06mf132nocngc1jkp3oqglz7jp > > See also the "Flight RPC: Ordered Data" section in the > design document for the proposals: > > > https://docs.google.com/document/d/1jhPyPZSOo2iy0LqIJVUs9KWPyFULVFJXTILDfkadx2g/edit# > > Background: > > Currently, the endpoints within a FlightInfo explicitly have > no ordering. > > This is unnecessarily limiting. Systems can and do implement > distributed sorts, but they can't reflect this in the > current specification. > > Proposal: > > Add a flag to FlightInfo. If the flag is set, the client may > assume that the data is sorted in the same order as the > endpoints. Otherwise, the client cannot make any assumptions > (as before). > > This is a compatible change because the client can just > ignore the flag. > > Implementation: > > https://github.com/apache/arrow/pull/35178 is an > implementation of this proposal. The pull requests has the > followings: > > 1. Format changes: > > https://github.com/apache/arrow/pull/35178/files#diff-53b6c132dcc789483c879f667a1c675792b77aae9a056b257d6b20287bb09dba > * format/Flight.proto > > 2. Documentation changes: > > https://github.com/apache/arrow/pull/35178/files#diff-839518fb41e923de682e8587f0b6fdb00eb8f3361d360c2f7249284a136a7d89 > * docs/source/format/Flight.rst > > 3. The C++ implementation and an integration test: > * cpp/src/arrow/flight/ > > 4. The Java implementation and an integration test (thanks to David Li!): > * java/flight/ > > 5. The Go implementation and an integration test: > * go/arrow/flight/ > * go/arrow/internal/flight_integration/ > > Next: > > I'll start a vote for this proposal after we reach a consensus > on this proposal. > > It's the standard process for format change. > See also: > > * [VOTE] Formalize how to change format > https://lists.apache.org/thread/jlc4wtt09rfszlzqdl55vrc4dxzscr4c > * GH-35084: [Docs][Format] Add how to change format specification > https://github.com/apache/arrow/pull/35174 > > > Thanks, > -- > kou >