How to get the PublishTime of PubsubMessages?

2017-09-29 Thread Derek Hao Hu
Hi, ​I'm trying to use [PubsubIO](​ https://beam.apache.org/documentation/sdks/javadoc/2.1.0/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.html) to read [PubsubMessage]( https://beam.apache.org/documentation/sdks/javadoc/2.1.0/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessage.html ). I've noticed that

Re: Beam and Python API: Pandas/Numpy?

2017-09-29 Thread Vilhelm von Ehrenheim
Hi Steve! I have several pipelines that successfully use both numpy and scikit models without any problems. I don't think I use Pandas atm but I'm sure that is fine too. However, you might have to do some special stuff if you encounter serializabillity problems. I also have tensorflow models in us

Duplicate metric names when using Flink runner + Graphite reporter

2017-09-29 Thread Reinier Kip
Hi all, I'm running a Beam pipeline on Flink and sending metrics via the Graphite reporter. I get repeated exceptions on the slaves, which try to register the same metric multiple times. These duplicates all concern task and operator metrics. I have given all pipeline nodes unique names. I am

Beam and Python API: Pandas/Numpy?

2017-09-29 Thread Steven DeLaurentis
Hi everyone, Came across this interesting project recently. Read through some of the docs and still had a question: is it possible to use NumPy/Pandas in the DoFn of a Beam? Or does the requirement of a serializable function preclude this possibility? Thanks, Steve

Re: Spark and Beam

2017-09-29 Thread Jean-Baptiste Onofré
That's correct, but I would avoid to do this generally speaking. IMHO, it's better to use the default Spark context created by the runner. Regards JB On 09/27/2017 01:02 PM, Romain Manni-Bucau wrote: You have the SparkPipelineOptions you can set (org.apache.beam.runners.spark.SparkPipelineOpti