Hi all,
I have a processing topology using PyFlink and SQL where there is data skew:
I'm splitting a stream of heterogenous data into separate streams based on the
type of data that's in it and some of these substreams have very many more
events than others and this is causing issues when check
What did change was the default starting position when not starting from a
checkpoint. With FlinkKafkaConsumer, it starts from the committed offsets
by default. With KafkaSource, it starts from the earliest offset.
David
On Fri, Jul 15, 2022 at 5:57 AM Chesnay Schepler wrote:
> I'm not sure abo
I'm not sure about the previous behavior, but at the very least
according to the documentation the behavior is identical.
1.12:
https://nightlies.apache.org/flink/flink-docs-release-1.12/dev/connectors/kafka.html#kafka-consumers-start-position-configuration
/|setStartFromEarliest()|/// //|set