Looks like after script *flink-daemon.sh *complete, it return exit 0. Kubernetes regard it as done. Is that expected?
Thanks, Qihua On Thu, Sep 30, 2021 at 11:11 AM Qihua Yang <yang...@gmail.com> wrote: > Thank you for your reply. > From the log, exit code is 0, and reason is Completed. > Looks like the cluster is fine. But why kubenetes restart the pod. As you > said, from perspective of Kubernetes everything is done. Then how to > prevent the restart? > It didn't even give chance to upload and run a jar.... > > Ports: 8081/TCP, 6123/TCP, 6124/TCP, 6125/TCP > Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP > Command: > /opt/flink/bin/entrypoint.sh > Args: > /opt/flink/bin/run-job-manager.sh > State: Waiting > Reason: CrashLoopBackOff > Last State: Terminated > Reason: Completed > Exit Code: 0 > Started: Wed, 29 Sep 2021 20:12:30 -0700 > Finished: Wed, 29 Sep 2021 20:12:45 -0700 > Ready: False > Restart Count: 131 > > Thanks, > Qihua > > On Thu, Sep 30, 2021 at 1:00 AM Chesnay Schepler <ches...@apache.org> > wrote: > >> Is the run-job-manager.sh script actually blocking? >> Since you (apparently) use that as an entrypoint, if that scripts exits >> after starting the JM then from the perspective of Kubernetes everything is >> done. >> >> On 30/09/2021 08:59, Matthias Pohl wrote: >> >> Hi Qihua, >> I guess, looking into kubectl describe and the JobManager logs would help >> in understanding what's going on. >> >> Best, >> Matthias >> >> On Wed, Sep 29, 2021 at 8:37 PM Qihua Yang <yang...@gmail.com> wrote: >> >>> Hi, >>> I deployed flink in session mode. I didn't run any jobs. I saw below >>> logs. That is normal, same as Flink menual shows. >>> >>> + /opt/flink/bin/run-job-manager.sh >>> Starting HA cluster with 1 masters. >>> Starting standalonesession daemon on host job-manager-776dcf6dd-xzs8g. >>> Starting taskexecutor daemon on host job-manager-776dcf6dd-xzs8g. >>> >>> >>> But when I check kubectl, it shows status is Completed. After a while, >>> status changed to CrashLoopBackOff, and pod restart. >>> NAME READY >>> STATUS RESTARTS AGE >>> job-manager-776dcf6dd-xzs8g 0/1 Completed 5 >>> 5m27s >>> >>> NAME READY >>> STATUS RESTARTS AGE >>> job-manager-776dcf6dd-xzs8g 0/1 CrashLoopBackOff 5 >>> 7m35s >>> >>> Anyone can help me understand why? >>> Why do kubernetes regard this pod as completed and restart? Should I >>> config something? either Flink side or Kubernetes side? From the Flink >>> manual, after the cluster is started, I can upload a jar to run the >>> application. >>> >>> Thanks, >>> Qihua >>> >> >>