So this would be a case where multiple "endpoints" are acting as a single
"stream of batches"?  Or am I misunderstanding?

What're some scenarios where that would be done?  When would it be
preferred for the client to merge the endpoints instead of the client's
user?

On Thu, Apr 27, 2023, 3:22 PM David Li <lidav...@apache.org> wrote:

> The server would have to report these as multiple endpoints in all your
> examples. (There's nothing saying a particular location can only appear
> once, or that "Endpoint 2" has to come after "Endpoint 1" for the DESC
> example.)
>
> The flag tells the client if it can fetch data in parallel without regard
> to order or if it should make sure to preserve the sorting of the data.
> (The ADBC Flight SQL clients in Go, C++, etc. already had to deal with
> this.) For instance Acero may care because certain plan nodes require some
> sort of ordering to be present; knowing a Flight datasource has this
> ordering could then save having to insert a sort operation into the plan.
>
> "Implementation defined" I think would basically devolve to clients always
> making the conservative/inefficient choice, like the Go ADBC driver always
> preserving order out of concern for compatibility and Acero always sorting
> data to use order-dependent nodes.
>
> On Thu, Apr 27, 2023, at 23:55, Andrew Lamb wrote:
> > I wonder if we have considered simply removing the statement "There is no
> > ordering defined on endpoints. Hence, if the returned data has an
> ordering,
> > it should be returned in a single endpoint." and  replacing it with
> > something that says "the relative ordering of data from different
> endpoints
> > is implementation defined"
> >
> > I am struggling to come up with a concrete usecase for the "ordered"
> flag.
> >
> > The ticket references "distributed sort" but most distributed sort
> > algorithms I know of would produce multiple sorted streams that need to
> be
> > merged together. For example
> >
> > Endpoint 1: (B, C, D)
> > Endpoint 2: (A, E, F)
> >
> > It is not clear how the "ordered" flag would help here
> >
> > If the intent is somehow to signal the client it doesn't have to merge
> > (e.g. with data like)
> >
> > Endpoint 1: (A, B, C)
> > Endpoint 2:  (D, E, F)
> >
> > This seems of very limited value if, for example, if the user desired
> DESC
> > order, then the endpoint would return
> >
> > Endpoint 1: (C, B, A)
> > Endpoint 2: (F, E, D)
> >
> > Which doesn't seem to conform to the updated definition
> >
> > Andrew
> >
> >
> > On Tue, Apr 25, 2023 at 8:56 PM Sutou Kouhei <k...@clear-code.com> wrote:
> >
> >> Hi,
> >>
> >> I would like to propose adding support for ordered data to
> >> Apache Arrow Flight. If anyone has comments for this
> >> proposal, please share them at here or the issue for this
> >> proposal: https://github.com/apache/arrow/issues/34852
> >>
> >> This is one of proposals in "[DISCUSS] Flight RPC/Flight
> >> SQL/ADBC enhancements":
> >>
> >>   https://lists.apache.org/thread/247z3t06mf132nocngc1jkp3oqglz7jp
> >>
> >> See also the "Flight RPC: Ordered Data" section in the
> >> design document for the proposals:
> >>
> >>
> >>
> https://docs.google.com/document/d/1jhPyPZSOo2iy0LqIJVUs9KWPyFULVFJXTILDfkadx2g/edit#
> >>
> >> Background:
> >>
> >> Currently, the endpoints within a FlightInfo explicitly have
> >> no ordering.
> >>
> >> This is unnecessarily limiting. Systems can and do implement
> >> distributed sorts, but they can't reflect this in the
> >> current specification.
> >>
> >> Proposal:
> >>
> >> Add a flag to FlightInfo. If the flag is set, the client may
> >> assume that the data is sorted in the same order as the
> >> endpoints. Otherwise, the client cannot make any assumptions
> >> (as before).
> >>
> >> This is a compatible change because the client can just
> >> ignore the flag.
> >>
> >> Implementation:
> >>
> >> https://github.com/apache/arrow/pull/35178 is an
> >> implementation of this proposal. The pull requests has the
> >> followings:
> >>
> >> 1. Format changes:
> >>
> >>
> https://github.com/apache/arrow/pull/35178/files#diff-53b6c132dcc789483c879f667a1c675792b77aae9a056b257d6b20287bb09dba
> >>    * format/Flight.proto
> >>
> >> 2. Documentation changes:
> >>
> >>
> https://github.com/apache/arrow/pull/35178/files#diff-839518fb41e923de682e8587f0b6fdb00eb8f3361d360c2f7249284a136a7d89
> >>    * docs/source/format/Flight.rst
> >>
> >> 3. The C++ implementation and an integration test:
> >>    * cpp/src/arrow/flight/
> >>
> >> 4. The Java implementation and an integration test (thanks to David
> Li!):
> >>    * java/flight/
> >>
> >> 5. The Go implementation and an integration test:
> >>    * go/arrow/flight/
> >>    * go/arrow/internal/flight_integration/
> >>
> >> Next:
> >>
> >> I'll start a vote for this proposal after we reach a consensus
> >> on this proposal.
> >>
> >> It's the standard process for format change.
> >> See also:
> >>
> >> * [VOTE] Formalize how to change format
> >>   https://lists.apache.org/thread/jlc4wtt09rfszlzqdl55vrc4dxzscr4c
> >> * GH-35084: [Docs][Format] Add how to change format specification
> >>   https://github.com/apache/arrow/pull/35174
> >>
> >>
> >> Thanks,
> >> --
> >> kou
> >>
>

Reply via email to