So this would be a case where multiple "endpoints" are acting as a single "stream of batches"? Or am I misunderstanding?
What're some scenarios where that would be done? When would it be preferred for the client to merge the endpoints instead of the client's user? On Thu, Apr 27, 2023, 3:22 PM David Li <lidav...@apache.org> wrote: > The server would have to report these as multiple endpoints in all your > examples. (There's nothing saying a particular location can only appear > once, or that "Endpoint 2" has to come after "Endpoint 1" for the DESC > example.) > > The flag tells the client if it can fetch data in parallel without regard > to order or if it should make sure to preserve the sorting of the data. > (The ADBC Flight SQL clients in Go, C++, etc. already had to deal with > this.) For instance Acero may care because certain plan nodes require some > sort of ordering to be present; knowing a Flight datasource has this > ordering could then save having to insert a sort operation into the plan. > > "Implementation defined" I think would basically devolve to clients always > making the conservative/inefficient choice, like the Go ADBC driver always > preserving order out of concern for compatibility and Acero always sorting > data to use order-dependent nodes. > > On Thu, Apr 27, 2023, at 23:55, Andrew Lamb wrote: > > I wonder if we have considered simply removing the statement "There is no > > ordering defined on endpoints. Hence, if the returned data has an > ordering, > > it should be returned in a single endpoint." and replacing it with > > something that says "the relative ordering of data from different > endpoints > > is implementation defined" > > > > I am struggling to come up with a concrete usecase for the "ordered" > flag. > > > > The ticket references "distributed sort" but most distributed sort > > algorithms I know of would produce multiple sorted streams that need to > be > > merged together. For example > > > > Endpoint 1: (B, C, D) > > Endpoint 2: (A, E, F) > > > > It is not clear how the "ordered" flag would help here > > > > If the intent is somehow to signal the client it doesn't have to merge > > (e.g. with data like) > > > > Endpoint 1: (A, B, C) > > Endpoint 2: (D, E, F) > > > > This seems of very limited value if, for example, if the user desired > DESC > > order, then the endpoint would return > > > > Endpoint 1: (C, B, A) > > Endpoint 2: (F, E, D) > > > > Which doesn't seem to conform to the updated definition > > > > Andrew > > > > > > On Tue, Apr 25, 2023 at 8:56 PM Sutou Kouhei <k...@clear-code.com> wrote: > > > >> Hi, > >> > >> I would like to propose adding support for ordered data to > >> Apache Arrow Flight. If anyone has comments for this > >> proposal, please share them at here or the issue for this > >> proposal: https://github.com/apache/arrow/issues/34852 > >> > >> This is one of proposals in "[DISCUSS] Flight RPC/Flight > >> SQL/ADBC enhancements": > >> > >> https://lists.apache.org/thread/247z3t06mf132nocngc1jkp3oqglz7jp > >> > >> See also the "Flight RPC: Ordered Data" section in the > >> design document for the proposals: > >> > >> > >> > https://docs.google.com/document/d/1jhPyPZSOo2iy0LqIJVUs9KWPyFULVFJXTILDfkadx2g/edit# > >> > >> Background: > >> > >> Currently, the endpoints within a FlightInfo explicitly have > >> no ordering. > >> > >> This is unnecessarily limiting. Systems can and do implement > >> distributed sorts, but they can't reflect this in the > >> current specification. > >> > >> Proposal: > >> > >> Add a flag to FlightInfo. If the flag is set, the client may > >> assume that the data is sorted in the same order as the > >> endpoints. Otherwise, the client cannot make any assumptions > >> (as before). > >> > >> This is a compatible change because the client can just > >> ignore the flag. > >> > >> Implementation: > >> > >> https://github.com/apache/arrow/pull/35178 is an > >> implementation of this proposal. The pull requests has the > >> followings: > >> > >> 1. Format changes: > >> > >> > https://github.com/apache/arrow/pull/35178/files#diff-53b6c132dcc789483c879f667a1c675792b77aae9a056b257d6b20287bb09dba > >> * format/Flight.proto > >> > >> 2. Documentation changes: > >> > >> > https://github.com/apache/arrow/pull/35178/files#diff-839518fb41e923de682e8587f0b6fdb00eb8f3361d360c2f7249284a136a7d89 > >> * docs/source/format/Flight.rst > >> > >> 3. The C++ implementation and an integration test: > >> * cpp/src/arrow/flight/ > >> > >> 4. The Java implementation and an integration test (thanks to David > Li!): > >> * java/flight/ > >> > >> 5. The Go implementation and an integration test: > >> * go/arrow/flight/ > >> * go/arrow/internal/flight_integration/ > >> > >> Next: > >> > >> I'll start a vote for this proposal after we reach a consensus > >> on this proposal. > >> > >> It's the standard process for format change. > >> See also: > >> > >> * [VOTE] Formalize how to change format > >> https://lists.apache.org/thread/jlc4wtt09rfszlzqdl55vrc4dxzscr4c > >> * GH-35084: [Docs][Format] Add how to change format specification > >> https://github.com/apache/arrow/pull/35174 > >> > >> > >> Thanks, > >> -- > >> kou > >> >