Hi,

The basic thing is that you will only get the messages in a guaranteed
order if the order is maintained in all steps from creation to use.
In Kafka order is only guaranteed for messages in the same partition.
So if you need them in order by account then the producing system must use
the accountid as the key used to force a specific account into a specific
kafka partition.
Then the Flink Kafka source will read them sequentially in the right order,
but in order to KEEP them in that order you should really to a keyby
immediately after reading and used only keyedstreams from that point
onwards.
As soon as you do shuffle or key by a different key then the ordering
within an account is no longer guaranteed.

In general I always put a very accurate timestamp in all of my events
(epoch milliseconds, in some cases even epoch microseconds) so I can always
check if an order problem occurred.

Niels Basjes

On Sun, Jul 29, 2018 at 9:25 AM, Congxian Qiu <qcx978132...@gmail.com>
wrote:

> Hi,
> Maybe the messages of the same key should be in the *same partition* of
> Kafka topic
>
> 2018-07-29 11:01 GMT+08:00 Hequn Cheng <chenghe...@gmail.com>:
>
>> Hi harshvardhan,
>> If 1.the messages exist on the same topic and 2.there are no rebalance
>> and 3.keyby on the same field with same value, the answer is yes.
>>
>> Best, Hequn
>>
>> On Sun, Jul 29, 2018 at 3:56 AM, Harshvardhan Agrawal <
>> harshvardhan.ag...@gmail.com> wrote:
>>
>>> Hey,
>>>
>>> The messages will exist on the same topic. I intend to keyby on the same
>>> field. The question is that will the two messages be mapped to the same
>>> task manager and on the same slot. Also will they be processed in correct
>>> order given they have the same keys?
>>>
>>> On Fri, Jul 27, 2018 at 21:28 Hequn Cheng <chenghe...@gmail.com> wrote:
>>>
>>>> Hi Harshvardhan,
>>>>
>>>> There are a number of factors to consider.
>>>> 1. the consecutive Kafka messages must exist in a same topic of kafka.
>>>> 2. the data should not been rebalanced. For example, operators should
>>>> be chained in order to avoid rebalancing.
>>>> 3. if you perform keyBy(), you should keyBy on a field the consecutive
>>>> two messages share the same value.
>>>>
>>>> Best, Hequn
>>>>
>>>> On Sat, Jul 28, 2018 at 12:11 AM, Harshvardhan Agrawal <
>>>> harshvardhan.ag...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>> We are currently using Flink to process financial data. We are getting
>>>>> position data from Kafka and we enrich the positions with account and
>>>>> product information. We are using Ingestion time while processing events.
>>>>> The question I have is: say I key the position datasream by account 
>>>>> number.
>>>>> If I have two consecutive Kafka messages with the same account and product
>>>>> info where the second one is an updated position of the first one, does
>>>>> Flink guarantee that the messages will be processed on the same slot in 
>>>>> the
>>>>> same worker? We want to ensure that we don’t process them out of order.
>>>>>
>>>>> Thank you!
>>>>> --
>>>>> Regards,
>>>>> Harshvardhan
>>>>>
>>>>
>>>> --
>>> Regards,
>>> Harshvardhan
>>>
>>
>>
>
>
> --
> Blog:http://www.klion26.com
> GTalk:qcx978132955
> 一切随心
>



-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Reply via email to