Re: Apache Beam a Complete Guide - Review?

2020-06-29 Thread Luke Cwik
The author for Apache Beam A Complete Guide does not have good reviews on Amazon for their other books and as you mentioned no reviews for this one. I would second the Streaming Systems book as the authors directly worked on Apache Beam. On Sun, Jun 28, 2020 at 6:46 PM Wesley Peng wrote: > Hi R

Re: DoFn with SideInput

2020-06-29 Thread Luke Cwik
The UpdateFn won't be invoked till the side input is ready which requires either the watermark to pass the end of the global window + allowed lateness (to show that the side input is empty) or at least one firing to populate it with data. See this general section on side inputs[1] and some useful p

KafkaIO does not support add or remove topics

2020-06-29 Thread Talat Uyarer
Hi, I am using Dataflow. When I update my DF job with a new topic or update partition count on the same topic. I can not use DF's update function. Could you help me to understand why I can not use the update function for that ? I checked the Beam source code but I could not find the right place t

Re: Can SpannerIO read data from different GCP project?

2020-06-29 Thread Luke Cwik
The intent is that you grant permissions to the account that is running the Dataflow job to the resources you want it to access in project B before you start the pipeline. This allows for much finer grain access control and the ability to revoke permissions without having to disable an entire accou

Re: KafkaIO does not support add or remove topics

2020-06-29 Thread Alexey Romanenko
Yes, it’s a known limitation [1] mostly due to the fact that KafkaIO.Read is based on UnboundedSource API and it fetches all information about topic and partitions only once during a “split" phase [2]. There is on-going work to make KafkaIO.Read based on Splittable DoFn [3] which should allow to

Re: Concurrency issue with KafkaIO

2020-06-29 Thread Alexey Romanenko
I don’t think it’s a known issue. Could you tell with version of Beam you use? > On 28 Jun 2020, at 14:43, wang Wu wrote: > > Hi, > We run Beam pipeline on Spark in the streaming mode. We subscribe to multiple > Kafka topics. Our job run fine until it is under heavy load: millions of > Kafka m

Re: KafkaIO does not support add or remove topics

2020-06-29 Thread Talat Uyarer
Thanks Alexey for sharing tickets and code. I found one workaround to use the update function. If I generate different KafkaIO step name for each submission and provide --transformNameMapping="{"Kafka_IO_3242/Read(KafkaUnboundedSource)/DataflowRunner.StreamingUnboundedRead.ReadWithIds":""}" My upd

Re: Concurrency issue with KafkaIO

2020-06-29 Thread wang Wu
Hi, We are using version 2.16.0. More about our dependencies: +- org.apache.beam:beam-sdks-java-core:jar:2.16.0:compile [INFO] | +- org.apache.beam:beam-model-job-management:jar:2.16.0:compile [INFO] | +- org.apache.beam:beam-vendor-bytebuddy-1_9_3:jar:0.1:compile [INFO] | \- org.tukaani:xz:ja

Re: [Java - Beam Schema] Manually Generating a Beam Schema for a POJO class

2020-06-29 Thread Brian Hulette
Hm it looks like the error is from trying to call the zero-arg constructor for the ArticleEnvelope proto class. Do you have a schema registered for ArticleEnvelope? I think maybe what's happening is Beam finds there's no schema registered for ArticleEnvelope, so it just recursively applies JavaFie

Re: DoFn with SideInput

2020-06-29 Thread Praveen K Viswanathan
Thank you Luke. I changed DefaultTrigger.of() to AfterProcessingTime. pastFirstElementInPane() and it worked. On Mon, Jun 29, 2020 at 9:09 AM Luke Cwik wrote: > The UpdateFn won't be invoked till the side input is ready which requires > either the watermark to pass the end of the global window +

Re: [Java - Beam Schema] Manually Generating a Beam Schema for a POJO class

2020-06-29 Thread Brian Hulette
It just occurred to me that BEAM-10265 [1] could be the cause of the stack overflow. Does ArticleEnvelope refer to itself recursively? Beam schemas are not allowed to be recursive, and it looks like we don't fail gracefully for recursive proto definitions. Brian [1] https://issues.apache.org/jira

Re: Apache Beam a Complete Guide - Review?

2020-06-29 Thread Wesley Peng
Hi Luke Luke Cwik wrote: The author for Apache Beam A Complete Guide does not have good reviews on Amazon for their other books and as you mentioned no reviews for this one. I would second the Streaming Systems book as the authors directly worked on Apache Beam. So, can Apache beam team i

Re: Apache Beam a Complete Guide - Review?

2020-06-29 Thread Rion Williams
Hi Wesley, In essence, yes, that’s what they did. The three authors of Streaming Systems are folks that work on Google’s Dataflow Project, which for all intents and purposes is essentially an implementation of the Beam Model. Two of them are members of the Beam PMC (essentially a steering comm

Re: Apache Beam a Complete Guide - Review?

2020-06-29 Thread Wesley Peng
Rion Williams wrote: The three authors of Streaming Systems are folks that work on Google’s Dataflow Project, which for all intents and purposes is essentially an implementation of the Beam Model. Two of them are members of the Beam PMC (essentially a steering committee for the project) and

Re: [Java - Beam Schema] Manually Generating a Beam Schema for a POJO class

2020-06-29 Thread Luke Cwik
Can you give context as to whether schemas will ever allow recursive types since this is pretty common in lots of languages? On Mon, Jun 29, 2020 at 5:13 PM Brian Hulette wrote: > It just occurred to me that BEAM-10265 [1] could be the cause of the stack > overflow. Does ArticleEnvelope refer to

Re:Re: Can SpannerIO read data from different GCP project?

2020-06-29 Thread Sheng Yang
Thanks Austin, Luke replying my message: I did some experiments, these are my code snippets. Manen: 2.21.0 com.google.cloud google-cloud-spanner-jdbc 1.15.0 com.google.cloud google-cloud-spanner 1.56.0 Java code: public class SpannerJdbcToCsvText { privat