So this is sightly different than what I was doing and spoke about. As far as I can tell from your links, you are evaluating the graphql using that graphql server and then converting the JSON response into arrow format (correct me if I'm wrong please).
What I did was to hook into a graphql parser and make my own evaluator which was arrow-native the whole way through. Using the GraphQL request to define the resulting Arrow schema based on the shape of the requested data. I had a planner and executor, with the executor using the plan to set up a pipeline to stream the record batches through. Just something to think about :) --Matt On Wed, Jul 27, 2022, 7:19 PM Lee, David <david....@blackrock.com.invalid> wrote: > I'm working on something similar for Ariadne which is a python graphql > server package. > > > https://github.com/davlee1972/ariadne_arrow/blob/arrow_flight/benchmark/test_arrow_flight_server.py > > https://github.com/davlee1972/ariadne_arrow/blob/arrow_flight/benchmark/test_asgi_arrow_client.py > > I'm basically calling pa.Table.from_pylist which infers the schema from > the first json record, but that record could be incomplete so passing a > schema is preferable. > > arrow_data = pa.Table.from_pylist([result]) > > Basically I need to look at the graphql query and then go into the graphql > SDL (Schema Definition Language) and generate an equivalent Arrow schema > based on the subset of data points requested. > > -----Original Message----- > From: Gavin Ray <ray.gavi...@gmail.com> > Sent: Wednesday, July 20, 2022 11:15 AM > To: dev@arrow.apache.org > Subject: Re: Arrow Flight usage with graph databases > > External Email: Use caution with links and attachments > > > > > > We considered the option to analyze data to build a schema on the fly, > > however it will be quite an expensive operation which will not allow > > us to get performance benefits from using Arrow Flight. > > > I'm not sure if you'll be able to avoid generating a schema on the fly, if > it's anything like SQL or GraphQL queries since each query would have a > unique shape based on the user's selection. > > Have you benchmarked this out of curiosity? > (It's not an uncommon usecase from what I've seen) > > For example, Matt Topol does this to dynamically generate response schemas > in his implementation of GraphQL-via-Flight and he says the overhead is > negligible. > > On Tue, Jul 19, 2022 at 11:52 PM Valentyn Kahamlyk < > valent...@bitquilltech.com.invalid> wrote: > > > Hi David, > > > > We are planning to use Flight for the prototype. We are also planning > > to use Flight SQL as a reference, however we wanted to explore ideas > > whether Arrow Flight Graph can be implemented on top of Arrow Flight > > (similar to Arrow Flight SQL). > > > > Graph databases generally do not expose or enforce schema, which > > indeed makes it challenging. While we do have ideas on building > > extensions for graph databases to add schema, and we do see some other > > ideas related to this, we will not be able to rely on this as part of > the initial prototype. > > We considered the option to analyze data to build a schema on the fly, > > however it will be quite an expensive operation which will not allow > > us to get performance benefits from using Arrow Flight. > > > > >What type/size metadata are you referring to? > > Metadata usually includes information about data type, size and > > type-specific properties. Some complex types are made up of 10 or more > > parts. Each Vertex or Edge of graph can have its own distinct set of > > properties, but the total number of types is several dozen and this > > can serve as a basis for constructing a schema. The total size of > > metadata can be quite big, as we wanted to support cases where the > > graph database can be very large (e.g. hundreds of GBs, with vertices > > and edges possibly containing different properties). > > More information about the serialization format we are using right now > > can be found at > https://urldefense.com/v3/__https://tinkerpop.apache.org/docs/3.5.4/dev/io/*graphbinary__;Iw!!KSjYCgUGsB4!dzRC2hHjZwTZ3GW0T6UCRaF722tbMO9StAJ_-RbcqRr_fg8xu478tctsdw1qspUjo4WSSdvmFtQ-R7u0Fmdr3jc$ > . > > > > >So effectively, the internal format is being carried in a > > >string/binary > > column? > > Yes, I am considering this option for the first stage of implementation. > > > > David, thank you again for your reply, and please let me know your > > thoughts or whether you might have any suggestions around adopting > > Arrow Flight for schema-less databases. > > > > Regards, Valentyn. > > > > On Mon, Jul 18, 2022 at 5:23 PM David Li <lidav...@apache.org> wrote: > > > > > Hi Valentyn, > > > > > > Just to make sure, is this Flight or Flight SQL? I ask since Flight > > itself > > > does not have a notion of transactions in the first place. I'm also > > curious > > > what the intended target client application is. > > > > > > Not being familiar with graph databases myself, I'll try to give > > > some comments… > > > > > > Lack of a schema does make things hard. There were some prior > > > discussions about schema evolution during a (Flight) data stream, > > > which would let you add/remove fields as the query progresses. And > > > unions would let you accommodate inconsistent types. But if the > > > changes are frequent, you'd negate many of the benefits of > > > Arrow/Flight. And both of these could make client-side usage > inconvenient. > > > > > > What type/size metadata are you referring to? Presumably, this would > > > instead end up in the schema, once using Arrow? > > > > > > Is there any possibility to (say) unify (chunks of) the result to a > > > consistent schema at least? Or possibly, encoding (some) properties > > > as a Map<String, Union<...>> instead of as columns. (This negates > > > the benefits of columnar data, of course, if you are interested in a > > > particular property, but if you know those properties up front, the > > > server could > > pull > > > those out into (consistently typed) columns.) > > > > > > > We are currently working on a prototype in which we are trying to > > > > use > > > Arrow Flight as a transport for transmitting requests and data to > > > Gremlin Server. Serialization is still based on an internal format > > > due to schema creation complexity. > > > > > > So effectively, the internal format is being carried in a > > > string/binary column? > > > > > > On Mon, Jul 18, 2022, at 19:55, Valentyn Kahamlyk wrote: > > > > Hi All, > > > > > > > > I'm investigating the possibility of using Arrow Flight with graph > > > databases, and exploring how to enable Arrow Flight endpoint in > > > Apache Tinkerpop Gremlin server. > > > > > > > > Now graph databases use several incompatible protocols that make > > > > it > > > difficult to use and spread the technology. > > > > A common features for graph databases are 1. Lack of a scheme. > > > > Each vertex of the graph can have its own set of > > > properties, including properties with the same name but different > types. > > > Metadata such as type and size are also passed with each value, > > > which increases the amount of data transferred. Some data types are > > > not > > supported > > > by all languages. > > > > 2. Internal representation of data is different for all > > implementations. > > > For data exchange we used a set of formats like customized JSON and > > custom > > > binary, but we would like to get a performance gain from using Arrow > > Flight. > > > > 3. The difference in concepts like transactions, sessions, etc. > > > Conceptually this may differ from the implementation in SQL. > > > > Gremlin server does not natively support transactions, so we use > > > > the > > > Neo4J plugin. > > > > > > > > We are currently working on a prototype in which we are trying to > > > > use > > > Arrow Flight as a transport for transmitting requests and data to > > > Gremlin Server. Serialization is still based on an internal format > > > due to schema creation complexity. > > > > > > > > Ideas are welcome. > > > > > > > > Regards, Valentyn > > > > > > > > This message may contain information that is confidential or privileged. > If you are not the intended recipient, please advise the sender immediately > and delete this message. See > http://www.blackrock.com/corporate/compliance/email-disclaimers for > further information. Please refer to > http://www.blackrock.com/corporate/compliance/privacy-policy for more > information about BlackRock’s Privacy Policy. > > > For a list of BlackRock's office addresses worldwide, see > http://www.blackrock.com/corporate/about-us/contacts-locations. > > © 2022 BlackRock, Inc. All rights reserved. >