Re: Python vs Java SDK Performance

2019-10-30 Thread Robert Bradshaw
Yep. What we don't (yet?) have is a Cython interface for writing DoFns that allows us to avoid calling the process method using python calling semantics. But Cython is used by Beam and installed on the workers ready to go to work for user code. On Wed, Oct 30, 2019 at 2:33 PM Shannon Duncan wrot

Re: Python vs Java SDK Performance

2019-10-30 Thread Shannon Duncan
I was about to ask if cython would work with the Beam SDK. I just started building the pipes to support cython in modules. On Wed, Oct 30, 2019 at 2:53 PM Robert Bradshaw wrote: > Python does not allow as much customization of serialization as is > available in Java, in part due to often not exp

Re: Python vs Java SDK Performance

2019-10-30 Thread Robert Bradshaw
Python does not allow as much customization of serialization as is available in Java, in part due to often not explicitly knowing the types at each point in the pipeline (though Udi is working on making this better, and there's ongoing work for adding explicit schema support as well). Somewhat to c

Re: Python vs Java SDK Performance

2019-10-30 Thread Luke Cwik
To my knowledge we haven't compared the cost of the "dill/pickle/..." coder to Java's SerializableCoder but even then you always have the power to write your own coders if you don't believe the default coders perform well in Python. Note that a lot of the Beam Python coders use cython to go fast s

Python vs Java SDK Performance

2019-10-14 Thread Shannon Duncan
Has anyone done any testing around the performance difference of Python SDK vs Java SDK on Google Dataflow? We recently dropped our requirement for sequence files in our pipeline which opens the door to using the python SDK vs the Java SDK. But my concern is loss of performance. In Java we contro