PyFlink and parallelism

2022-07-15 Thread John Tipper
Hi all, I have a processing topology using PyFlink and SQL where there is data skew: I'm splitting a stream of heterogenous data into separate streams based on the type of data that's in it and some of these substreams have very many more events than others and this is causing issues when check

Re: Did the semantics of Kafka's earliest offset change with the new source API?

2022-07-15 Thread David Anderson
What did change was the default starting position when not starting from a checkpoint. With FlinkKafkaConsumer, it starts from the committed offsets by default. With KafkaSource, it starts from the earliest offset. David On Fri, Jul 15, 2022 at 5:57 AM Chesnay Schepler wrote: > I'm not sure abo

Re: Did the semantics of Kafka's earliest offset change with the new source API?

2022-07-15 Thread Chesnay Schepler
I'm not sure about the previous behavior, but at the very least according to the documentation the behavior is identical. 1.12: https://nightlies.apache.org/flink/flink-docs-release-1.12/dev/connectors/kafka.html#kafka-consumers-start-position-configuration /|setStartFromEarliest()|/// //|set