Hi! Just created the JIRA (https://issues.apache.org/jira/browse/FLINK-10225).
Thanks for your reply, Pierre Le jeu. 23 août 2018 à 14:31, Kostas Kloudas <k.klou...@data-artisans.com> a écrit : > Hi Pierre, > > You are right that this should not happen. > It seems like a bug. > Could you open a JIRA and post it here? > > Thanks, > Kostas > > > On Aug 21, 2018, at 9:35 PM, Pierre Zemb <pierre.zemb.i...@gmail.com> > wrote: > > Hi! > > I’ve started to deploy a small Flink cluster (4tm and 1jm for now on > 1.6.0), and deployed a small job on it. Because of the current load, job is > completely handled by a single tm. I’ve created a small proxy that is using > QueryableStateClient > <https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/queryablestate/client/QueryableStateClient.html> > to access the current state. It is working nicely, except under certain > circumstances. It seems to me that I can only access the state through a > node that is holding a part of the job. Here’s an example: > > - job on tm1. Pointing QueryableStateClient to tm1. State accessible > - job still on tm1. Pointing QueryableStateClient to tm2 (for > example). State inaccessible > - killing tm1, job is now on tm2. State accessible > - job still on tm2. Pointing QueryableStateClient to tm3. State > inaccessible > - adding some parallelism to spread job on tm1 and tm2. Pointing > QueryableStateClient to either tm1 and tm2 is working > - job still on tm1 and tm2. Pointing QueryableStateClient to tm3. > State inaccessible > > When the state is inaccessible, I can see this (generated here > <https://github.com/apache/flink/blob/release-1.6/flink-queryable-state/flink-queryable-state-runtime/src/main/java/org/apache/flink/queryablestate/client/proxy/KvStateClientProxyHandler.java#L228> > ): > > java.lang.RuntimeException: Failed request 0. > Caused by: > org.apache.flink.queryablestate.exceptions.UnknownLocationException: Could > not retrieve location of state=repo-status of > job=3ac3bc00b2d5bc0752917186a288d40a. Potential reasons are: i) the state is > not ready, or ii) the job does not exist. > at > org.apache.flink.queryablestate.client.proxy.KvStateClientProxyHandler.getKvStateLookupInfo(KvStateClientProxyHandler.java:228) > at > org.apache.flink.queryablestate.client.proxy.KvStateClientProxyHandler.getState(KvStateClientProxyHandler.java:162) > at > org.apache.flink.queryablestate.client.proxy.KvStateClientProxyHandler.executeActionAsync(KvStateClientProxyHandler.java:129) > at > org.apache.flink.queryablestate.client.proxy.KvStateClientProxyHandler.handleRequest(KvStateClientProxyHandler.java:119) > at > org.apache.flink.queryablestate.client.proxy.KvStateClientProxyHandler.handleRequest(KvStateClientProxyHandler.java:63) > at > org.apache.flink.queryablestate.network.AbstractServerHandler$AsyncRequestTask.run(AbstractServerHandler.java:236) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > From the documentation, I can see that: > > The client connects to a Client Proxy running on a given Task Manager. The > proxy is the entry point of the client to the Flink cluster. It forwards > the requests of the client to the Job Manager and the required Task > Manager, and forwards the final response back the client. > > Did I miss something? Is the QueryableStateClientProxy only fetching info > from a job that is running on his local tm? If so, is there a way to > retrieve the job-graph? Or maybe another solution? > > Thanks! > Pierre Zemb > > -- > Cordialement, > Pierre Zemb > pierrezemb.fr > Software Engineer, Metrics Data Platform @OVH > > > -- Cordialement, Pierre Zemb pierrezemb.fr Software Engineer, Metrics Data Platform @OVH