Tbh, I don’t see why it can be improved if these two pipelines will use the
same amount of slots and resources.
> On 3 Jul 2020, at 11:49, wang Wu wrote:
>
> One question: We have 5 topics and we adding them all to KafkaIO. Will it
> help to improve the throughput of pipeline if:
> We add onl
One question: We have 5 topics and we adding them all to KafkaIO. Will it help
to improve the throughput of pipeline if:
We add only 1 topic to KafkaIO and create 5 PCollection (unbounded) . Each
collection will go through the same transform and write to the same final sink.
Regards
Dinh
> On 3
Thank you for the information. Here is our Kafka client version:
[INFO] +- org.apache.kafka:kafka-clients:jar:2.3.0:compile
[INFO] | +- com.github.luben:zstd-jni:jar:1.4.0-1:compile
[INFO] | +- org.lz4:lz4-java:jar:1.6.0:compile
[INFO] | \- org.xerial.snappy:snappy-java:jar:1.1.7.3:compile
I a
KafkaUnboundedReader is not thread-safe and, maybe I’m wrong, but I don’t
think it’s supposed to be so since every KafkaUnboundedReader is supposed to
read from every split, represented by KafkaUnboundedSource, independently.
Though, in KafkaIO case, if total number of splits is less than numb
We encountered similar exception with KafkaUnboundedReader. By similarity I
mean it start from
org.apache.spark.rdd.RDD.computeOrReadCheckpoint
And it ends at org.apache.beam.sdk.io.kafka.KafkaUnboundedReader.advance
Just another type of concurrency bug.
I am sorry for the long stack trace
My qu
Hi,
We are using version 2.16.0. More about our dependencies:
+- org.apache.beam:beam-sdks-java-core:jar:2.16.0:compile
[INFO] | +- org.apache.beam:beam-model-job-management:jar:2.16.0:compile
[INFO] | +- org.apache.beam:beam-vendor-bytebuddy-1_9_3:jar:0.1:compile
[INFO] | \- org.tukaani:xz:ja
I don’t think it’s a known issue. Could you tell with version of Beam you use?
> On 28 Jun 2020, at 14:43, wang Wu wrote:
>
> Hi,
> We run Beam pipeline on Spark in the streaming mode. We subscribe to multiple
> Kafka topics. Our job run fine until it is under heavy load: millions of
> Kafka m
Hi,
We run Beam pipeline on Spark in the streaming mode. We subscribe to multiple
Kafka topics. Our job run fine until it is under heavy load: millions of Kafka
messages coming per seconds. The exception look like concurrency issue. Is it a
known bug in Beam or some Spark configuration we could