[ 
https://issues.apache.org/jira/browse/FLINK-30883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17683432#comment-17683432
 ] 

Matthias Pohl edited comment on FLINK-30883 at 10/4/23 10:01 AM:
-----------------------------------------------------------------

I extracted the available logs into dedicated files. It appears that the 
jobmanager restarted at least once. The strange thing is that even in 
{{jobmanager.0.log}} it states that the job was recovered.

Source: jobmanager.0.log
{code:java}
Feb 01 15:03:03 2023-02-01 14:55:28,715 INFO  
org.apache.flink.client.deployment.application.executors.EmbeddedExecutor [] - 
Job 0000000068e961ca0000000000000000 was recovered successfully.
{code}
It looks like there was more than one JobManager restart happening with the 
expected log line "Job \{jobId} is submitted" probably being located in the 
logs of the missing JobManager run. But I struggle to find evidence for this 
theory: No additional logs are provided.


was (Author: mapohl):
I extracted the available logs into dedicated files. It appears that the 
jobmanager restarted at least once. The strange thing is that even in 
{{jobmanager.0.log}} it states that the job was recovered.

Source: jobmanager.0.log
{code}
Feb 01 15:03:03 2023-02-01 14:55:28,715 INFO  
org.apache.flink.client.deployment.application.executors.EmbeddedExecutor [] - 
Job 0000000068e961ca0000000000000000 was recovered successfully.
{code}

 It looks like there was more than one JobManager restart happening with the 
expected log line "Job {jobId} is submitted" probably being located in the logs 
of the missing JobManager run. But I struggle to find evidence for this theory: 
No additional logs are provided.

> Missing JobID caused the k8s e2e test to fail
> ---------------------------------------------
>
>                 Key: FLINK-30883
>                 URL: https://issues.apache.org/jira/browse/FLINK-30883
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / Kubernetes, Runtime / Coordination
>    Affects Versions: 1.17.0
>            Reporter: Matthias Pohl
>            Priority: Major
>              Labels: auto-deprioritized-critical, test-stability
>         Attachments: e2e_test_failure.log, 
> flink-vsts-client-fv-az378-840.log, jobmanager.0.log, jobmanager.1.log, 
> taskmanager.log
>
>
> We've experienced a test failure in {{Run kubernetes application HA test}} 
> due to a {{CliArgsException}}:
> {code}
> Feb 01 15:03:15 org.apache.flink.client.cli.CliArgsException: Missing JobID. 
> Specify a JobID to cancel a job.
> Feb 01 15:03:15       at 
> org.apache.flink.client.cli.CliFrontend.cancel(CliFrontend.java:689) 
> ~[flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
> Feb 01 15:03:15       at 
> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1107) 
> ~[flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
> Feb 01 15:03:15       at 
> org.apache.flink.client.cli.CliFrontend.lambda$mainInternal$9(CliFrontend.java:1189)
>  ~[flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
> Feb 01 15:03:15       at 
> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
>  [flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
> Feb 01 15:03:15       at 
> org.apache.flink.client.cli.CliFrontend.mainInternal(CliFrontend.java:1189) 
> [flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
> Feb 01 15:03:15       at 
> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1157) 
> [flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=45569&view=logs&j=bea52777-eaf8-5663-8482-18fbc3630e81&s=ae4f8708-9994-57d3-c2d7-b892156e7812&t=b2642e3a-5b86-574d-4c8a-f7e2842bfb14&l=9866



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to