Re: Checking a Pcoll is empty in Apache Beam

2021-07-21 Thread Rajnil Guha
Hi, Thanks for your suggestions, I will surely check them out. My exact use-case is to check if the Pcoll is empty, and if it is, publish a message into a Pub/Sub topic. This message will then be further used downstream by some other processes. Thanks & Regards Rajnil Guha On Wed, Jul 21, 2021 a

Re: Checking a Pcoll is empty in Apache Beam

2021-07-21 Thread Rajnil Guha
Yes I am just thinking how to modify/rewrite this piece of code if I want to run my pipeline on Dataflow runner. Thanks & Regards Rajnil Guha On Wed, Jul 21, 2021 at 1:12 AM Robert Bradshaw wrote: > On Tue, Jul 20, 2021 at 12:33 PM Rajnil Guha > wrote: > > > > Hi, > > > > Thank you so much fo

Re: Checking a Pcoll is empty in Apache Beam

2021-07-20 Thread Rajnil Guha
Hi, Thank you so much for your help, the collect() function works. I tried below and it prints correctly. is_empty_check = (dupe_records | "CountGloballyDupes" >> Count.Globally() #| "IsEmptyCheck" >> beam.Map(lambda x: x == 0) ) is_empty_df = ib.collect(is_empty_check) if(is_empty_df.iloc[0,0]

Re: Checking a Pcoll is empty in Apache Beam

2021-07-19 Thread Rajnil Guha
Hi, I tried running some experiments on Interactive Runner before actually running the pipeline on dataflow. Below is what I did though not sure if this is the correct way to do it:- Let's say I want to check whether the Pcoll named is_empty_check is empty or not and based on which I take a decis

Re: Checking a Pcoll is empty in Apache Beam

2021-07-18 Thread Rajnil Guha
Hi Reuven, Yes, for now this is a bounded PCollection. Thanks & Regards Rajnil Guha On Mon, Jul 19, 2021 at 12:02 AM Reuven Lax wrote: > Is this a bounded collection? > > On Sun, Jul 18, 2021, 11:17 AM Rajnil Guha > wrote: > >> Hi Beam Users, >> >> I have a use-case where I need to check whet

Checking a Pcoll is empty in Apache Beam

2021-07-18 Thread Rajnil Guha
Hi Beam Users, I have a use-case where I need to check whether a Pcollection is empty or not. If it's not empty I need to write a message to a Pub/Sub topic. I am using the Python SDK and Dataflow to write and run my pipelines respectively. I searched but could not come across any concrete way on