Re: Concurrency issue with KafkaIO

2020-07-03 Thread Alexey Romanenko
Tbh, I don’t see why it can be improved if these two pipelines will use the same amount of slots and resources. > On 3 Jul 2020, at 11:49, wang Wu wrote: > > One question: We have 5 topics and we adding them all to KafkaIO. Will it > help to improve the throughput of pipeline if: > We add onl

Re: Concurrency issue with KafkaIO

2020-07-03 Thread wang Wu
One question: We have 5 topics and we adding them all to KafkaIO. Will it help to improve the throughput of pipeline if: We add only 1 topic to KafkaIO and create 5 PCollection (unbounded) . Each collection will go through the same transform and write to the same final sink. Regards Dinh > On 3

Re: Concurrency issue with KafkaIO

2020-07-02 Thread wang Wu
Thank you for the information. Here is our Kafka client version: [INFO] +- org.apache.kafka:kafka-clients:jar:2.3.0:compile [INFO] | +- com.github.luben:zstd-jni:jar:1.4.0-1:compile [INFO] | +- org.lz4:lz4-java:jar:1.6.0:compile [INFO] | \- org.xerial.snappy:snappy-java:jar:1.1.7.3:compile I a

Re: Concurrency issue with KafkaIO

2020-07-02 Thread Alexey Romanenko
KafkaUnboundedReader is not thread-safe and, maybe I’m wrong, but I don’t think it’s supposed to be so since every KafkaUnboundedReader is supposed to read from every split, represented by KafkaUnboundedSource, independently. Though, in KafkaIO case, if total number of splits is less than numb

Re: Concurrency issue with KafkaIO

2020-06-30 Thread wang Wu
We encountered similar exception with KafkaUnboundedReader. By similarity I mean it start from org.apache.spark.rdd.RDD.computeOrReadCheckpoint And it ends at org.apache.beam.sdk.io.kafka.KafkaUnboundedReader.advance Just another type of concurrency bug. I am sorry for the long stack trace My qu

Re: Concurrency issue with KafkaIO

2020-06-29 Thread wang Wu
Hi, We are using version 2.16.0. More about our dependencies: +- org.apache.beam:beam-sdks-java-core:jar:2.16.0:compile [INFO] | +- org.apache.beam:beam-model-job-management:jar:2.16.0:compile [INFO] | +- org.apache.beam:beam-vendor-bytebuddy-1_9_3:jar:0.1:compile [INFO] | \- org.tukaani:xz:ja

Re: Concurrency issue with KafkaIO

2020-06-29 Thread Alexey Romanenko
I don’t think it’s a known issue. Could you tell with version of Beam you use? > On 28 Jun 2020, at 14:43, wang Wu wrote: > > Hi, > We run Beam pipeline on Spark in the streaming mode. We subscribe to multiple > Kafka topics. Our job run fine until it is under heavy load: millions of > Kafka m

Concurrency issue with KafkaIO

2020-06-28 Thread wang Wu
Hi, We run Beam pipeline on Spark in the streaming mode. We subscribe to multiple Kafka topics. Our job run fine until it is under heavy load: millions of Kafka messages coming per seconds. The exception look like concurrency issue. Is it a known bug in Beam or some Spark configuration we could