Re: [Java - Beam Schema] Manually Generating a Beam Schema for a POJO class

2020-06-30 Thread Kaymak, Tobias
I want to make my example as simple as possible while also not leaving out the details that might be the reason for the error. I don't think there is any recursiveness. I can also share the ArticleEnvelope Protobuf file If that helps. I've tried to register the ArticleEnvelope schema like this:

Re: Apache Beam a Complete Guide - Review?

2020-06-30 Thread Maximilian Michels
"Streaming Systems" is a great book to understand the concepts behind Apache Beam and other streaming systems. However, it is not a book about how to write, deploy, or monitor Beam pipelines. Such a book is yet to come out. As for the "Apache Beam" book you linked, that one seems to be a fraud

DoFn with OnTimer and watermark

2020-06-30 Thread Peter Benedikovic
Hi, I would like to ask you for advice. I am trying to join and deduplicate events from Kafka. Let’s say we have event of three types A, B and C. If B or C arrive first, they have to wait for A until they are emitted downstream. If A does not arrive until some predefined duration (gap), B/C are

Re: Re: Can SpannerIO read data from different GCP project?

2020-06-30 Thread Luke Cwik
Apache Beam pipelines have two parts two them. There is code that describes the pipeline shape and what transforms it contains (block 1 and results.apply(...)) and then there is the code that represents those transforms (MapFn in your case) and is executed remotely. I would take a look at the Beam

Re: Concurrency issue with KafkaIO

2020-06-30 Thread wang Wu
We encountered similar exception with KafkaUnboundedReader. By similarity I mean it start from org.apache.spark.rdd.RDD.computeOrReadCheckpoint And it ends at org.apache.beam.sdk.io.kafka.KafkaUnboundedReader.advance Just another type of concurrency bug. I am sorry for the long stack trace My qu

Understanding combiner's distribution logic

2020-06-30 Thread Julien Phalip
Hi, I had a question about how combiners work, particularly on how the combined PCollection's subsets are initially formed. I understand that, according to the documentation , a combiner allows parallelizing the computation to mult

Re: Understanding combiner's distribution logic

2020-06-30 Thread Luke Cwik
Your reasoning is correct around the withHotkeyFanout hint and it is to help runners know that there is likely one or more keys that will have significantly more data then the others but the logic around how it is broken up is runner dependent and whether they rely on the hint or not is also runner

Re: Understanding combiner's distribution logic

2020-06-30 Thread Julien Phalip
Thanks Luke! One part I'm still a bit unclear about is how exactly the PreCombine stage works. In particular, I'm wondering how it can perform the combination before the GBK. Is it because it can already compute the combination on adjacent elements that happen to share the same key? Could you als