Hello, I am a developer trying to use Apache Beam, and I am running into an issue where my WaitOn step is not working as expected. I want my pipeline to read all the data from an S3 bucket using ParquetIO before moving on to the rest of the steps in my pipeline. However, I see in my DAG that even though there is a collect step after all the data is being read in, my pipeline still reads from S3 in the subsequent steps. It appears that the Wait.on is not actually happening. Is it even possible to wait on a read step? This is what my code looks like:
PCollection<GenericRecord> records = pipeline.apply("Read parquet file in as Generic Records", ParquetIO.read(finalSchema).from(beamReadPath).withConfiguration(configuration)); PCollection<GenericRecord> recordsWaited = records .apply("Waiting on Read Parquet File", Wait.on(records)).setCoder(AvroCoder.of(GenericRecord.class, finalSchema)); {Processing of rest of data subsequently} Any help would be greatly appreciated, thanks! Sincerely, Ramya ______________________________________________________________________ The information contained in this e-mail may be confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.