WordCount Java Example in Portable Runner

2021-04-22 Thread Ke Wu
Hello everyone, Is there any document on how to run a java pipeline (e.g. org.apache.beam.examples.WordCount.java) in portable mode? Best, Ke

Re: Question on late data handling in Beam streaming mode

2021-04-22 Thread Kenneth Knowles
Hello! In a streaming app, you have two choices: wait forever and never have any output OR use some method to decide that aggregation is "done". In Beam, the way you decide that aggregation is "done" is the watermark. When the watermark predicts no more data for an aggregation, then the aggregati

Question on late data handling in Beam streaming mode

2021-04-22 Thread Tao Li
Hi Beam community, I am wondering if there is a risk of losing late data from a Beam stream app due to watermarking? I just went through this design doc and noticed the “droppable” definition there: https://docs.google.com/document/d/12r7frmxNickxB5tbpuEh_n35_IJeVZn1peOrBrhhP6Y/edit# Can you

Re: Any easy way to extract values from PCollection?

2021-04-22 Thread Tao Li
Thanks everyone for your suggestions! From: Ning Kang Reply-To: "user@beam.apache.org" Date: Thursday, April 22, 2021 at 10:51 AM To: "user@beam.apache.org" Cc: Yuan Feng Subject: Re: Any easy way to extract values from PCollection? +1 to Brian's answer. In Java, you can singleValuedPcollec

Re: Any easy way to extract values from PCollection?

2021-04-22 Thread Ning Kang
+1 to Brian's answer. In Java, you can singleValuedPcollection .apply("Write single value", TextIO.write(). to(options.getSomeGcsPath()) as the last step of your pipeline. Then in your program, after executing the pipeline (wait until finish), use the Cloud Storage Java client library

Re: Any easy way to extract values from PCollection?

2021-04-22 Thread Brian Hulette
I don't think there's an easy answer to this question, in general all you can do with a PCollection is indicate you'd like to write it out to an IO. There has been some work in the Python SDK on "Interactive Beam" which is designed for using the Python SDK interactively in a notebook environment. I