Thanks for the update! Since we are still in the planning stage I will try to 
find another way to achieve what we are trying to do in the meantime and I'll 
keep an eye on that Jira. Two workarounds I thought about are to either match 
the parallelism of the source to the partition count, or since this is a batch 
process that will be triggered, I can pass in a timestamp to start consuming 
from so that any lingering end of stream message from a previous run can be 
skipped.

Bill Wicker

-----Original Message-----
From: Tzu-Li Tai <tzuli...@gmail.com> 
Sent: Monday, February 17, 2020 1:42 AM
To: user@flink.apache.org
Subject: Re: KafkaFetcher closed before end of stream is received for all 
partitions.

Hi,

Your observations are correct.
It is expected that the result of `KafkaDeserializationSchema#isEndOfStream`
triggers a single subtask to escape its fetch loop. Therefore, if a subtask is 
assigned multiple partitions, as soon as one record (regardless of which 
partition it came from) signals end of stream, then the subtask ends.

I'm afraid there is probably no good solution to this given the ill-defined 
semantics of the `isEndOfStream` method. All reasonable approaches that come to 
mind require some sort of external trigger to manually shut down the job.

For now, I've filed a JIRA to propose a possible solution to the semantics of 
the method: https://issues.apache.org/jira/browse/FLINK-16112



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply via email to