Hi Seye,

Thanks for digging into the problem.

As Vino and Jorn suggested, this looks like a bug and please file a JIRA issue.
It would be also nice if you could post it  here so that we know the related 
discussion.

Cheers,
Kostas

> On Oct 14, 2018, at 9:46 AM, Jörn Franke <jornfra...@gmail.com> wrote:
> 
> You have to file an issue. One workaround to see if this really fixes your 
> problem could be to use reflection to mark this method as public and then 
> call it (it is of course nothing for production code). You can also try a 
> newer Flink version.
> 
>> Am 13.10.2018 um 18:02 schrieb Seye Jin <seyej...@gmail.com>:
>> 
>> I recently upgraded to flink 1.4 from 1.3 and leverage Queryable State 
>> client in my application. I have 1 jm and 5 tm all serviced behind 
>> kubernetes. A large state is built and distributed evenly across task 
>> mangers and the client can query state for specified key
>> 
>> Issue: if a task manager dies and a new one gets spun up(automatically) and 
>> the QS states successfully recover in new nodes/task slots. I start to get 
>> time out exception when the client tries to query for key, even if I try to 
>> reset or re-deploy the client jobs
>> 
>> I have been trying to triage this and figure out a way to remediate this 
>> issue and I found that in KvStateClientProxyHandler which is not exposed in 
>> code, there is a forceUpdate flag that can help reset KvStateLocations(plus 
>> inetAddresses) but the default is false and can't be overriden
>> 
>> I was wandering if anyone knows how to remediate this kind of issue or if 
>> there is a way to have the jobmanager know that the task manager location in 
>> cache is no more valid.
>> 
>> Any tip to resolve this will be appreciated (I can't downgrade back to 1.3 
>> or upgrade from 1.4)
>> 

Reply via email to