Yep.
What we don't (yet?) have is a Cython interface for writing DoFns that
allows us to avoid calling the process method using python calling
semantics. But Cython is used by Beam and installed on the workers
ready to go to work for user code.
On Wed, Oct 30, 2019 at 2:33 PM Shannon Duncan
wrot
I was about to ask if cython would work with the Beam SDK. I just started
building the pipes to support cython in modules.
On Wed, Oct 30, 2019 at 2:53 PM Robert Bradshaw wrote:
> Python does not allow as much customization of serialization as is
> available in Java, in part due to often not exp
Python does not allow as much customization of serialization as is
available in Java, in part due to often not explicitly knowing the
types at each point in the pipeline (though Udi is working on making
this better, and there's ongoing work for adding explicit schema
support as well). Somewhat to c
To my knowledge we haven't compared the cost of the "dill/pickle/..." coder
to Java's SerializableCoder but even then you always have the power to
write your own coders if you don't believe the default coders perform well
in Python.
Note that a lot of the Beam Python coders use cython to go fast s
Has anyone done any testing around the performance difference of Python SDK
vs Java SDK on Google Dataflow?
We recently dropped our requirement for sequence files in our pipeline
which opens the door to using the python SDK vs the Java SDK. But my
concern is loss of performance.
In Java we contro