Ahh got it, perfectly clear now -- thank you! On Mon, Mar 7, 2022 at 11:20 AM David Li <lidav...@apache.org> wrote:
> So "Flight" and "Flight SQL" are distinct projects. Flight defines RPC > methods, and "Flight SQL" defines higher-level methods on top of the Flight > methods. The optimization proposed is for Flight. Once/if that gets > accepted and implemented, Flight SQL servers could then use it to optimize > GetCatalogs: they would return a FlightInfo that has the data embedded. So > yes, all the methods should get support for this once things get worked out. > > On Mon, Mar 7, 2022, at 10:57, Gavin Ray wrote: > > Sure, will use that JIRA issue for whatever thoughts/feedback =) > > > > On that note, filed the above bug here: > > https://issues.apache.org/jira/browse/ARROW-15861 > > > > About the "two-step" thing, I guess what I mean is code like this > > where you make the initial op, then get the stream: > > > > val catalogs: FlightInfo = client.getCatalogs() > > val stream: FlightStream = > > client.getStream(catalogs.endpoints[0].ticket) > > while (stream.next()) { > > stream.root.use { root -> println(root.contentToTSVString()) } > > } > > > > You override two methods, "getFlightInfoCatalogs" and then > > "getStreamCatalogs" > > Maybe I misunderstood -- can you just return the data directly from IE > > "getFlightInfoCatalogs" > > > > Ideally I'd love to be able to do something like: > > > > val catalogs = client.getCatalogs() > > for (catalog in catalogs.rows) {} > > > > But maybe this is not really feasible/practical with how Arrow works as a > > format or the architecture of Flight > > > > And RE: the JS implementation, TypeScript is my primary language so I'd > > love to be useful there if I could =) > > It's also much faster to prototype stuff in JS/TS due to lack of > > compilation. > > > > On Mon, Mar 7, 2022 at 10:46 AM David Li <lidav...@apache.org> wrote: > > > >> (responses inline) > >> > >> On Mon, Mar 7, 2022, at 10:37, Gavin Ray wrote: > >> >> > >> >> Another contributor is currently working on some Java > >> >> tutorials/documentation so any feedback would be helpful. > >> > > >> > > >> > Ah, yeah this would be incredibly useful. Will compile some thoughts, > >> where > >> > should I share them? > >> > Didn't know about the Cookbook, definitely going to be tonight's > reading! > >> > > >> > >> Would you mind putting them on the overall Jira? > >> https://issues.apache.org/jira/browse/ARROW-15156 > >> > >> If there's questions about the cookbook, or tasks where it's not clear > how > >> to accomplish them, you can file issues directly on the cookbook repo > too. > >> > >> > > >> > Ah, I suppose having the small-value optimization would mostly cover > your > >> >> needs then? And then grpc-web or a similar bridge should suffice for > >> you. > >> > > >> > > >> > Yeah 100% > >> > Wanted to ask a question on this -- is there a possibility to add the > >> > "one-shot" single message RPC s for all operations? > >> > > >> > In my case it's mostly extra-overhead to send the first ticket, get a > >> > statement handle, and then make a second call which streams the > results > >> > Would be awesome to have the ability to opt-in to one-shot messages > for > >> > both Metadata and Query operations > >> > >> Hmm, which other operations are you looking at? For instance, GetSchema > >> takes a FlightDescriptor directly. It's really just DoGet that has that > >> two-step structure. > >> > >> > > >> > If you have details about the dependency issue, do you mind filing a > Jira > >> >> issue? > >> >> Seems something might have changed and we should be prepared to fix > it. > >> >> (Flight/Java does a lot of poking at internal APIs to try to avoid > >> copies.) > >> > > >> > > >> > Absolutely, no problem. I'll revert my dep override and file an issue > >> with > >> > the stacktrace. > >> > >> Thanks! > >> > >> > --- > >> > On a side note, I've started work on a Node.js implementation of > Flight + > >> > FlightSQL in the Arrow repo. > >> > Never worked with gRPC but hopefully I can get the majority of the > work > >> > finished and file a draft PR =) > >> > >> That will be interesting to see. I believe the Arrow JS implementation > >> could use some more attention in general. > >> > >> > > >> > https://gist.github.com/GavinRay97/876c8e8476b18c8eb01cb6e8f807bf28 > >> > > >> > On Mon, Mar 7, 2022 at 9:55 AM David Li <lidav...@apache.org> wrote: > >> > > >> >> Cool - if you have API questions, feel free to send them here or > >> >> u...@arrow.apache.org. Another contributor is currently working on > some > >> >> Java tutorials/documentation so any feedback would be helpful. > There's > >> also > >> >> some basic recipes here: https://github.com/apache/arrow-cookbook/ > >> >> > >> >> Ah, I suppose having the small-value optimization would mostly cover > >> your > >> >> needs then? And then grpc-web or a similar bridge should suffice for > >> you. > >> >> > >> >> If you have details about the dependency issue, do you mind filing a > >> Jira > >> >> issue? Seems something might have changed and we should be prepared > to > >> fix > >> >> it. (Flight/Java does a lot of poking at internal APIs to try to > avoid > >> >> copies.) > >> >> > >> >> Thanks, > >> >> David > >> >> > >> >> On Mon, Mar 7, 2022, at 09:48, Gavin Ray wrote: > >> >> > Ah brilliant! Yeah, Websockets (or anything that's a basic > transport > >> and > >> >> > doesn't require a language-specific SDK) would be fantastic. > >> >> > > >> >> > In my case, streaming wouldn't be a requirement, at least not for > some > >> >> time > >> >> > (more of a nice-to-have). > >> >> > It'd be mostly OLTP-style workloads, with small response sizes > >> >> (10-1,000kB). > >> >> > > >> >> > By the way -- wanted to thank yourself and the others from the > mailing > >> >> list > >> >> > for all the help. > >> >> > Last night I was able to get a basic FlightSQL server > implementation > >> >> > working based on the feedback I'd got here. > >> >> > > >> >> > Now the only challenge is not being familiar with the Arrow format > + > >> >> > APIs/working with vector-based data > >> >> > Majority of the time was in trying to figure out how to translate > JVM > >> >> > arrays/objects into Arrow values. > >> >> > > >> >> > The one thing I did have to do is override dependencies due to a > >> problem > >> >> in > >> >> > netty/grpc with an > >> >> > incompatible constructor signature for "PooledByteBufAllocator" > >> >> > > >> >> > // workaround for bug with PooledByteBufAllocator > >> >> > implementation("io.grpc", "grpc-netty").version { > >> >> > strictly("1.44.1") > >> >> > } > >> >> > implementation("io.netty", "netty-all").version { > >> >> > strictly("4.1.74.Final") > >> >> > } > >> >> > implementation("io.netty", "netty-codec").version { > >> >> > strictly("4.1.74.Final") > >> >> > } > >> >> > > >> >> > On Mon, Mar 7, 2022 at 9:39 AM David Li <lidav...@apache.org> > wrote: > >> >> > > >> >> >> No worries about questions, it's always good to see how people are > >> using > >> >> >> Arrow. > >> >> >> > >> >> >> For tunneling Flight/gRPC over HTTP: this has been a long-standing > >> >> >> question. I believe some people have had success with one of the > >> various > >> >> >> gRPC-HTTP proxies. In particular, I recall Deephaven has done this > >> >> >> successfully (with some workaround for the lack of streaming > >> methods). > >> >> If > >> >> >> Nate is around, maybe he can describe what they've done. > >> >> >> > >> >> >> There's also an ongoing effort to enable alternative transports in > >> >> Flight > >> >> >> [1], which would let us implement (say) a native WebSocket > transport. > >> >> >> > >> >> >> For these methods specifically: they basically wrap Protobuf > >> >> >> SerializeToString/ParseFromString so you could use them to try to > >> >> implement > >> >> >> your own protocol using HTTP, yes. > >> >> >> > >> >> >> [1]: https://github.com/apache/arrow/pull/12465 > >> >> >> > >> >> >> -David > >> >> >> > >> >> >> On Mon, Mar 7, 2022, at 09:24, Gavin Ray wrote: > >> >> >> > Due to the current implementation status of FlightSQL > (C++/Rust/JVM > >> >> only) > >> >> >> > > >> >> >> > I am trying to see whether it's possible to allow FlightSQL over > >> >> >> something > >> >> >> > like HTTP/REST so that arbitrary languages can be used. > >> >> >> > > >> >> >> > In the codebase, I saw these (and their deserialize > counterparts): > >> >> >> > > >> >> >> > /// \brief Get the wire-format representation of this type. > >> >> >> > /// Useful when interoperating with non-Flight systems (e.g. > REST > >> >> >> > /// services) that may want to return Flight types. > >> >> >> > arrow::Result<std::string> SerializeToString() const; > >> >> >> > > >> >> >> > /** > >> >> >> > * Get the serialized form of this protocol message. > >> >> >> > * <p>Intended to help interoperability by allowing non-Flight > >> >> services > >> >> >> > to still return Flight types. > >> >> >> > */ > >> >> >> > public ByteBuffer serialize() { > >> >> >> > return ByteBuffer.wrap(toProtocol().toByteArray()); > >> >> >> > } > >> >> >> > > >> >> >> > I know this is probably very low-priority at the moment, but > just > >> >> wanted > >> >> >> to > >> >> >> > ask about whether it's even possible. > >> >> >> > Thank you, and sorry for spamming the mailing list with so many > >> >> questions > >> >> >> > lately =) > >> >> >> > >> >> > >> >