Re: BigQuery Storage API now supports Arow

2019-07-30 Thread Micah Kornfield
> > If the current gRPC stub definitions are reasonably stable (in your > opinion), I might try implementing support. I would guess that is relatively stable, but I don't think I can make any guarantees (as far as I know there are no guarantees made between beta and GA API versions). So while I w

Re: BigQuery Storage API now supports Arow

2019-07-30 Thread Micah Kornfield
> > This is nice. Reading the original ML thread [1], does this mean that > high-speed Avro-to-Arrow parsing has become less important now? I think this is still important from an Arrow perspective. Avro is still a very popular serialization format and probably the most popular one Arrow doesn

Re: Further Flight optimizations (was Re: BigQuery Storage API now supports Arow)

2019-07-29 Thread Antoine Pitrou
Le 29/07/2019 à 16:16, David Li a écrit : > This is getting rather off the original topic, so I changed the subject. > > This is the code in gRPC-Python, where incoming message data is copied > into a Python bytearray: > https://github.com/grpc/grpc/blob/b8b6df08ae6d9f60e1b282a659d26b8c340de5c9/

Further Flight optimizations (was Re: BigQuery Storage API now supports Arow)

2019-07-29 Thread David Li
This is getting rather off the original topic, so I changed the subject. This is the code in gRPC-Python, where incoming message data is copied into a Python bytearray: https://github.com/grpc/grpc/blob/b8b6df08ae6d9f60e1b282a659d26b8c340de5c9/src/python/grpcio/grpc/_cython/_cygrpc/operation.pyx.p

Re: BigQuery Storage API now supports Arow

2019-07-29 Thread Antoine Pitrou
Le 29/07/2019 à 15:13, David Li a écrit : > Ah, sorry, I was unclear - the performance issue is not with Flight at > all, but with putting Arrow over gRPC naively. > > At some point, we benchmarked gRPC-Python carrying Arrow data, and > found that it only achieved ~half the throughput of Flight-

Re: BigQuery Storage API now supports Arow

2019-07-29 Thread David Li
Ah, sorry, I was unclear - the performance issue is not with Flight at all, but with putting Arrow over gRPC naively. At some point, we benchmarked gRPC-Python carrying Arrow data, and found that it only achieved ~half the throughput of Flight-Python. So implementing BigQuery-Flight would also avo

Re: BigQuery Storage API now supports Arow

2019-07-29 Thread Antoine Pitrou
Hi David, On Mon, 29 Jul 2019 09:06:52 -0400 David Li wrote: > > If the current gRPC stub definitions are reasonably stable (in your > opinion), I might try implementing support. That might get reasonable > performance still, especially in Python (where I've found that a lot > of performance i

Re: BigQuery Storage API now supports Arow

2019-07-29 Thread David Li
Hey Micah, There hasn't really been formal discussions of Flight "backends", but there has been some talk about supporting protocols besides gRPC (which is why the implementation tries to abstract away from gRPC). So it might be interesting to treat this as another "protocol" in Flight clients tha

Re: BigQuery Storage API now supports Arow

2019-07-29 Thread Antoine Pitrou
Hi Micah, Le 27/07/2019 à 05:43, Micah Kornfield a écrit : > Hi Arrow Dev, > As a follow-up to an old thread [1] on working with BigQuery and Arrow. I > just wanted to share some work that Brian Hulette and I helped out with. > > I'm happy to announce there is now preliminary support for readin

Re: BigQuery Storage API now supports Arow

2019-07-27 Thread Micah Kornfield
> > That’s awesome!! It’s pretty surreal to request a feature from google and > have it built out. I hope this is beneficial customers in general. Thank you for filing the request. If I'm reading the code correctly looks like you are transporting the > IPC payload in the protobuf format of the

Re: BigQuery Storage API now supports Arow

2019-07-27 Thread Micah Kornfield
Hi David, > I see the original thread mentioned Flight support, do you think it'd > be possible to support Flight natively? Or conversely, maybe this > could be a candidate for a new Flight "backend" as has been discussed. Right now our main priority is addressing the caveats I mentioned above. A

Re: BigQuery Storage API now supports Arow

2019-07-27 Thread Wes McKinney
Very nice! If I'm reading the code correctly looks like you are transporting the IPC payload in the protobuf format of the bigquery storage API https://github.com/googleapis/google-cloud-python/blob/3d324389b92d43e52486f0fe2aca8b41e950640c/bigquery_storage/google/cloud/bigquery_storage_v1beta1/p

Re: BigQuery Storage API now supports Arow

2019-07-27 Thread Jonathan Chiang
Hi Micah, That’s awesome!! It’s pretty surreal to request a feature from google and have it built out. Thanks, Jonathan > On Jul 26, 2019, at 8:43 PM, Micah Kornfield wrote: > > Hi Arrow Dev, > As a follow-up to an old thread [1] on working with BigQuery and Arrow. I > just wanted to share

Re: BigQuery Storage API now supports Arow

2019-07-27 Thread Fan Liya
@Micah Kornfield Awesome work! Big congratulations! Best, Liya Fan On Sat, Jul 27, 2019 at 9:17 PM David Li wrote: > This is super awesome, thanks for sharing! > > I see the original thread mentioned Flight support, do you think it'd > be possible to support Flight natively? Or conversely, may

Re: BigQuery Storage API now supports Arow

2019-07-27 Thread David Li
This is super awesome, thanks for sharing! I see the original thread mentioned Flight support, do you think it'd be possible to support Flight natively? Or conversely, maybe this could be a candidate for a new Flight "backend" as has been discussed. Best, David On 7/26/19, Micah Kornfield wrote