I had that identical problem. Here’s what I came up with: https://github.com/ubiquibit-inc/sensor-failure
On Tue, Apr 9, 2019 at 04:37 Akila Wajirasena <akila.wajiras...@gmail.com> wrote: > Hi > > I have a Kafka topic which is already loaded with data. I use a stateful > structured streaming pipeline using flatMapGroupWithState to consume the > data in kafka in a streaming manner. > > However when I set shuffle partition count > 1 I get some out of order > messages in to each of my GroupState. Is this the expected behavior or is > the message ordering guaranteed when using flatMapGroupWithState with Kafka > source? > > This my pipline; > > Kafka => GroupByKey(key from Kafka schema) => flatMapGroupWithState => > parquet > > When I printed out the Kafka offset for each key inside my state update > function they are not in order. I am using spark 2.3.3. > > Thanks & Regards, > Akila > > > > -- Thanks, Jason