Le 29/07/2019 à 15:13, David Li a écrit : > Ah, sorry, I was unclear - the performance issue is not with Flight at > all, but with putting Arrow over gRPC naively. > > At some point, we benchmarked gRPC-Python carrying Arrow data, and > found that it only achieved ~half the throughput of Flight-Python. So > implementing BigQuery-Flight would also avoid that performance > pitfall, assuming the client library for BigQuery-Arrow uses > gRPC-Python. > > The reason we found is that since gRPC technically does not require > Protobuf, it copies message payloads into a CPython bytestring, and > then the Python code then turns around and hands that to Protobuf, > which then copies data into its data structures and gives it back to > Python
gRPC shouldn't need to copy the payload into a CPython bytestring. Instead, it could instantiate a buffer-like Python object pointing to the original data. This is "easily" done in Cython, and gRPC-python already uses Cython: https://cython.readthedocs.io/en/latest/src/userguide/buffer.html https://docs.python.org/3/c-api/buffer.html Regards Antoine.