I don't think ss now support "partitioned" watermark. and why different
partition's consumption rate vary? If the handling logic is quite
different, using different topic is a better way.

On Fri, Sep 1, 2017 at 4:59 PM, 张万新 <kevinzwx1...@gmail.com> wrote:

> Thanks, it's true that looser watermark can guarantee more data not be
> dropped, but at the same time more state need to be kept.   I just consider
> if there is sth like kafka-partition-aware watermark in flink in SS may be
> a better solution.
>
> Tathagata Das <tathagata.das1...@gmail.com>于2017年8月31日周四 上午9:13写道:
>
>> Why not set the watermark to be looser, one that works across all
>> partitions? The main usage of watermark is to drop state. If you loosen the
>> watermark threshold (e.g. from 1 hour to 10 hours), then you will keep more
>> state with older data, but you are guaranteed that you will not drop
>> important data.
>>
>> On Wed, Aug 30, 2017 at 7:41 AM, KevinZwx <kevinzwx1...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I'm working with Structured Streaming to process logs from kafka and use
>>> watermark to handle late events. Currently the watermark is computed by
>>> (max
>>> event time seen by the engine - late threshold), and the same watermark
>>> is
>>> used for all partitions.
>>>
>>> But in production environment it happens frequently that different
>>> partition
>>> is consumed at different speed, the consumption of some partitions may be
>>> left behind, so the newest event time in these partitions may be much
>>> smaller than than the others'. In this case using the same watermark for
>>> all
>>> partitions may cause heavy data loss.
>>>
>>> So is there any way to achieve different watermark for different kafka
>>> partition or any plan to work on this?
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>>
>>

Reply via email to