Hi Arvid, Thanks for your response. I think I did not word my question properly. I wanted to confirm that if the data is distributed to more than one partition then the ordering cannot be maintained (which is documented). According to your response I understand if I set the parallelism to number of partition then each consumer will consume from one partition and ordering can be maintained.
However, I have a question here in case my parallelism is less than number of partitions still I believe if I create keyedstream ordering will be maintained at operator level for that key. Correct me if I am wrong. Second, one issue/challenge which I see with this model is one of the source's frequency of pushing data is very high then one partition is overloaded. Hence the task which process this will be overloaded too, however for maintaining ordering I do not have any other options but to maintain data in one partition. Thanks, Hemant On Wed, Feb 19, 2020 at 5:54 PM Arvid Heise <ar...@ververica.com> wrote: > Hi Hemant, > > Flink passes your configurations to the Kafka consumer, so you could check > if you can subscribe to only one partition there. > > However, I would discourage that approach. I don't see the benefit to just > subscribing to the topic entirely and have dedicated processing for the > different devices. > > If you are concerned about the order, you shouldn't. Since all events of a > specific device-id reside in the same source partition, events are in-order > in Kafka (responsibility of producer, but I'm assuming that because of your > mail) and thus they are also in order in non-keyed streams in Flink. Any > keyBy on device-id or composite key involving device-id, would also retain > the order. > > If you have exactly one partition per device-id, you could even go with > `DataStreamUtil#reinterpretAsKeyedStream` to avoid any shuffling. > > Let me know if I misunderstood your use case or if you have further > questions. > > Best, > > Arvid > > On Wed, Feb 19, 2020 at 8:39 AM hemant singh <hemant2...@gmail.com> wrote: > >> Hello Flink Users, >> >> I have a use case where I am processing metrics from different type of >> sources(one source will have multiple devices) and for aggregations as well >> as build alerts order of messages is important. To maintain customer data >> segregation I plan to have single topic for each customer with each source >> stream data to one kafka partition. >> To maintain ordering I am planning to push data for a single source type >> to single partitions. Then I can create keyedstream so that each of the >> device-id I have a single stream which has ordered data for each device-id. >> >> However, flink-kafka consumer I don't see that I can read from a specific >> partition hence flink consumer read from multiple kafka partitions. So even >> if I try to create a keyedstream on source type(and then write to a >> partition for further processing like keyedstream on device-id) I think >> ordering will not be maintained per source type. >> >> Only other option I feel I am left with is have single partition for the >> topic so that flink can subscribe to the topic and this maintains the >> ordering, the challenge is too many topics(as I have this configuration for >> multiple customers) which is not advisable for a kafka cluster. >> >> Can anyone shed some light on how to handle this use case. >> >> Thanks, >> Hemant >> >