I see, thanks. That sounds reasonable, and indeed if you are subscribing to lots of data sources (instead of just trying to maximize throughput as I think most people have done before) async may be helpful. There are actually two features being described here though: 1. The C++ Flight client needs some way to wrap an existing gRPC client, much like it can in Java. This may be tricky in C++ (and will require applications to carefully manage gRPC versions) and will probably not be possible for Python. 2. The C++ Flight client should offer async APIs (perhaps callback-based APIs or perhaps completion-queue based or something else). https://issues.apache.org/jira/browse/ARROW-1009 may be relevant here. I think gRPC C++ has experimental callback-based async APIs now but it may be hard for us to actually take advantage of unless we can draw a hard line on minimum required gRPC version. The main question will probably be finding someone to do the work.
Best, David On Thu, Jun 3, 2021, at 12:11, Nate Bauernfeind wrote: > In addition to Arrow Flight we have other gRPC APIs that work together as a > whole. For example, the API client establishes a session with the server. > Then the client tells the server to create a derivative data stream by > filtering/joining/computing/etc several source streams. The server will > keep this derivative stream available as long as the session exists. This > session is kept alive with a heartbeat. So a small part of this system > requires a regular (but not overwhelming) heartbeat. If this derivative > stream has any data source that is live (aka regularly updating) then the > derivative is also live. The API client would likely want to subscribe to > the derivative stream, but still needs to heartbeat. The subscription > (communicated via Flight's DoExchange) should be able to last forever. > Naturally, this API pattern is simply better suited to the async pattern. > > Ideally, we could re-use the same http2 connections, task queue and/or > thread-pool between the arrow flight api and the other grpc apis (talking > to the same endpoint). > > The gRPC callback pattern in Java is nice; it's a bit of a shame that gRPC > hasn't settled on a c++ callback pattern. > > These are the kinds of ideas that it enables: > - Subscribe to multiple ticking sources simultaneously. > - Can have multiple gRPC requests outstanding at the same time > (particularly useful if server is in a remote data center with high RTT). > - Can communicate with multiple remote hosts simultaneously. > - In general, enables the client to be event driven. > > Note that our clients tend to be light; they typically listen to derived > data and then act based on the information without further computation > locally. > > The gRPC c++ async library does indeed look a bit under-documented. There > are a few blog posts that highlight some of the surprises, but async > behavior is a requirement for what we're working on. (for async-cpp-grpc > surprises see: > https://www.gresearch.co.uk/article/lessons-learnt-from-writing-asynchronous-streaming-grpc-services-in-c/ > ). > > Nate > > On Wed, Jun 2, 2021 at 8:44 PM David Li <lidav...@apache.org > <mailto:lidavidm%40apache.org>> wrote: > > > Hey Nate, > > > > I think there's an open JIRA for something like this. I'd love to have > > something that plays nicely with asyncio/trio in Python and is hopefully > > more efficient. (I think it would also let us finally have per-message > > timeouts instead of only a per-call deadline.) There are some challenges > > though, e.g. we wouldn't expose gRPC's event loop directly so that we could > > support other transports, but then that leaves more things to design. I > > also recall the async C++ APIs being very underdocumented, I get the sense > > that they aren't actually used except to improve some benchmarks. I'll note > > for instance gRPC in Python, which offers async support, uses the "core" > > APIs directly and doesn't use anything C++ offers. > > > > But long story short, if you're interested in this I think it would be a > > useful addition. What sorts of things would it enable for you? > > > > -David > > > > On Wed, Jun 2, 2021, at 16:20, Nate Bauernfeind wrote: > > > It seems to me that the c++ arrow flight implementation uses only the > > > synchronous version of the gRPC API. gRPC supports asynchronous message > > > delivery in C++ via a CompletionQueue that must be polled. Has there been > > > any desire to standardize on a solution for asynchronous use cases, > > perhaps > > > delivered via a provided CompletionQueue? > > > > > > For a simple async grpc c++ example you can look here: > > > > > https://github.com/grpc/grpc/blob/master/examples/cpp/helloworld/greeter_async_client.cc > > > > > > Thanks, > > > Nate > > > > > > -- > > > > > > > -- >