Occasional RPC DEADLINE_EXCEEDED with PubSub source on Python/DirectRunner

2020-06-06 Thread Pradip Thachile
Hi, I'm testing a pipeline using Python Beam and the DirectRunner that reads from a Pub/Sub subscription however after "some time" (i.e. I don't quite see a predictable pattern yet) I get a bundle failure exception that flags a failure with a gRPC call to Pub/Sub: ERROR:apache_beam.runners.dir

Beam/Python/Flink: Unable to deserialize UnboundedSource for PubSub source

2020-06-08 Thread Pradip Thachile
Hey folks, I posted this on the Flink user mailing list but didn't get any traction there (potentially since this is Beam related?). I've got a Beam/Python pipeline that works on the DirectRunner and now am trying to run this on a local dev Flink cluster. Running this yields an error out the g

Re: Beam/Python/Flink: Unable to deserialize UnboundedSource for PubSub source

2020-06-09 Thread Pradip Thachile
Quick update: this test code works just fine on Dataflow as well as the DirectRunner. Looks like the FlinkRunner is problematic for some reason here. On 2020/06/08 20:11:13, Pradip Thachile wrote: > Hey folks, > > I posted this on the Flink user mailing list but didn't ge

Python/Beam Windowing + Triggering Recommendations

2020-06-11 Thread Pradip Thachile
Hi, I'm writing a pipeline in Python/Beam 2.19 and wanted to get the opinions of the folks on list here around the best way to implement the following logic, specifically around the window + trigger combinations to use. HourlySource := a PCollection that receives an event an hour from another

Unit and Integration Testing Recommendations on Beam/Python

2020-06-17 Thread Pradip Thachile
Hey folks, Can anyone offer any recommendations or guidelines on how to unit test DoFns and CombineFns as well as build integration tests for PTransforms in Beam/Python? Thanks! -Pradip