Le 29/07/2019 à 16:16, David Li a écrit : > This is getting rather off the original topic, so I changed the subject. > > This is the code in gRPC-Python, where incoming message data is copied > into a Python bytearray: > https://github.com/grpc/grpc/blob/b8b6df08ae6d9f60e1b282a659d26b8c340de5c9/src/python/grpcio/grpc/_cython/_cygrpc/operation.pyx.pxi#L165-L173 > > In fact, I think the `bytes(bytearray)` call at the end is an additional copy.
Right. This is definitely not optimal. Ideally they would do a single b''.join(...) on a list of memoryviews (or, if there is a single slice, return a memoryview to that slice). Since this is off-topic for Flight, I'll leave it here though :-) Regards Antoine. > > We do something similar in Flight-C++: > https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/serialization-internal.cc#L105-L118 > > It's an open question whether we can get gRPC to avoid these copies. > > Somewhat related, Flight-Java performance is hindered by this gRPC > issue: https://github.com/grpc/grpc-java/issues/5433 > > Essentially, the backpressure signal in gRPC-Java is currently not > related to actual network conditions at all. Alluxio implemented their > own flow control for a 30% throughput improvement: > https://github.com/Alluxio/alluxio/commit/6f02b41ea529b9f59c0c42de216f402b3b4c9882 > > Best, > David > > On 7/29/19, Antoine Pitrou <anto...@python.org> wrote: >> >> Le 29/07/2019 à 15:13, David Li a écrit : >>> Ah, sorry, I was unclear - the performance issue is not with Flight at >>> all, but with putting Arrow over gRPC naively. >>> >>> At some point, we benchmarked gRPC-Python carrying Arrow data, and >>> found that it only achieved ~half the throughput of Flight-Python. So >>> implementing BigQuery-Flight would also avoid that performance >>> pitfall, assuming the client library for BigQuery-Arrow uses >>> gRPC-Python. >>> >>> The reason we found is that since gRPC technically does not require >>> Protobuf, it copies message payloads into a CPython bytestring, and >>> then the Python code then turns around and hands that to Protobuf, >>> which then copies data into its data structures and gives it back to >>> Python >> >> gRPC shouldn't need to copy the payload into a CPython bytestring. >> Instead, it could instantiate a buffer-like Python object pointing to >> the original data. This is "easily" done in Cython, and gRPC-python >> already uses Cython: >> https://cython.readthedocs.io/en/latest/src/userguide/buffer.html >> https://docs.python.org/3/c-api/buffer.html >> >> Regards >> >> Antoine. >>