Hi,
I'm trying to use [PubsubIO](
https://beam.apache.org/documentation/sdks/javadoc/2.1.0/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.html)
to read [PubsubMessage](
https://beam.apache.org/documentation/sdks/javadoc/2.1.0/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessage.html
).
I've noticed that
Hi Steve!
I have several pipelines that successfully use both numpy and scikit models
without any problems. I don't think I use Pandas atm but I'm sure that is
fine too.
However, you might have to do some special stuff if you encounter
serializabillity problems. I also have tensorflow models in us
Hi all,
I'm running a Beam pipeline on Flink and sending metrics via the Graphite
reporter. I get repeated exceptions on the slaves, which try to register the
same metric multiple times. These duplicates all concern task and operator
metrics. I have given all pipeline nodes unique names.
I am
Hi everyone,
Came across this interesting project recently. Read through some of the docs
and still had a question: is it possible to use NumPy/Pandas in the DoFn of a
Beam? Or does the requirement of a serializable function preclude this
possibility?
Thanks,
Steve
That's correct, but I would avoid to do this generally speaking.
IMHO, it's better to use the default Spark context created by the runner.
Regards
JB
On 09/27/2017 01:02 PM, Romain Manni-Bucau wrote:
You have the SparkPipelineOptions you can set
(org.apache.beam.runners.spark.SparkPipelineOpti