Hi Peter, Can you provide relevant JobManager logs? And can you write down what steps have you taken before the failure happened? Did this failure occur during upgrading Flink, or after the upgrade etc.
Best, Piotrek śr., 8 wrz 2021 o 16:11 Peter Westermann <no.westerm...@genesys.com> napisał(a): > We recently upgraded from Flink 1.12.4 to 1.12.5 and are seeing some weird > behavior after a change in jobmanager leadership: We’re seeing two copies > of the same job, one of those is in SUSPENDED state and has a start time of > zero. Here’s the output from the /jobs/overview endpoint: > > { > > "jobs": [{ > > "jid": "2db4ee6397151a1109d1ca05188a4cbb", > > "name": "analytics-flink-v1", > > "state": "RUNNING", > > "start-time": 1631106146284, > > "end-time": -1, > > "duration": 2954642, > > "last-modification": 1631106152322, > > "tasks": { > > "total": 112, > > "created": 0, > > "scheduled": 0, > > "deploying": 0, > > "running": 112, > > "finished": 0, > > "canceling": 0, > > "canceled": 0, > > "failed": 0, > > "reconciling": 0 > > } > > }, { > > "jid": "2db4ee6397151a1109d1ca05188a4cbb", > > "name": "analytics-flink-v1", > > "state": "SUSPENDED", > > "start-time": 0, > > "end-time": -1, > > "duration": 1631105900760, > > "last-modification": 0, > > "tasks": { > > "total": 0, > > "created": 0, > > "scheduled": 0, > > "deploying": 0, > > "running": 0, > > "finished": 0, > > "canceling": 0, > > "canceled": 0, > > "failed": 0, > > "reconciling": 0 > > } > > }] > > } > > > > Has anyone seen this behavior before? > > > > Thanks, > > Peter >