rmetzger commented on pull request #14239:
URL: https://github.com/apache/flink/pull/14239#issuecomment-734461109
Thanks a lot for addressing the issues I've reported. While testing this PR,
I noticed that the job got stuck while submitting it:
```
2020-11-26 20:46:43,453 INFO
org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Received
JobGraph submission 362e7f6a2a9901e5d6d2ea69d40a69d4 (State machine job).
2020-11-26 20:46:43,453 INFO
org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Submitting
job 362e7f6a2a9901e5d6d2ea69d40a69d4 (State machine job).
2020-11-26 20:46:43,455 INFO
org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting RPC
endpoint for org.apache.flink.runtime.jobmaster.JobMaster at
akka://flink/user/rpc/jobmanager_4 .
2020-11-26 20:46:43,455 INFO org.apache.flink.runtime.jobmaster.JobMaster
[] - Initializing job State machine job
(362e7f6a2a9901e5d6d2ea69d40a69d4).
2020-11-26 20:46:43,456 INFO org.apache.flink.runtime.jobmaster.JobMaster
[] - Using restart back off time strategy
FixedDelayRestartBackoffTimeStrategy(maxNumberRestartAttempts=2147483647,
backoffTimeMS=1000) for State machine job (362e7f6a2a9901e5d6d2ea69d40a69d4).
2020-11-26 20:46:43,457 INFO org.apache.flink.runtime.jobmaster.JobMaster
[] - Running initialization on master for job State machine job
(362e7f6a2a9901e5d6d2ea69d40a69d4).
2020-11-26 20:46:43,457 INFO org.apache.flink.runtime.jobmaster.JobMaster
[] - Successfully ran initialization on master in 0 ms.
2020-11-26 20:46:43,492 INFO
org.apache.flink.runtime.scheduler.adapter.DefaultExecutionTopology [] - Built
1 pipelined regions in 0 ms
2020-11-26 20:46:43,492 INFO org.apache.flink.runtime.jobmaster.JobMaster
[] - No state backend has been configured, using default (Memory
/ JobManager) MemoryStateBackend (data in heap memory / checkpoints to
JobManager) (checkpoints: 'null', savepoints: 'null', asynchronous: TRUE,
maxStateSize: 5242880)
2020-11-26 20:46:43,492 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - No checkpoint
found during restore.
2020-11-26 20:46:43,493 INFO org.apache.flink.runtime.jobmaster.JobMaster
[] - Using failover strategy
org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy@530fdf01
for State machine job (362e7f6a2a9901e5d6d2ea69d40a69d4).
2020-11-26 20:46:43,493 INFO
org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl [] - JobManager
runner for job State machine job (362e7f6a2a9901e5d6d2ea69d40a69d4) was granted
leadership with session id 00000000-0000-0000-0000-000000000000 at
akka.tcp://flink@localhost:6123/user/rpc/jobmanager_4.
2020-11-26 20:46:43,493 INFO org.apache.flink.runtime.jobmaster.JobMaster
[] - Starting execution of job State machine job
(362e7f6a2a9901e5d6d2ea69d40a69d4) under job master id
00000000000000000000000000000000.
2020-11-26 20:46:43,493 INFO
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Starting
split enumerator for source Source: Kafka Source.
2020-11-26 20:46:43,501 INFO
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Closing
SourceCoordinator for source Source: Kafka Source.
2020-11-26 20:46:43,502 INFO
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Source
coordinator for source Source: Kafka Source closed.
2020-11-26 20:46:43,502 INFO org.apache.flink.runtime.jobmaster.JobMaster
[] - Connecting to ResourceManager
akka.tcp://flink@localhost:6123/user/rpc/resourcemanager_*(00000000000000000000000000000000)
2020-11-26 20:46:43,503 INFO org.apache.flink.runtime.jobmaster.JobMaster
[] - Resolved ResourceManager address, beginning registration
2020-11-26 20:46:43,503 INFO
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] -
Registering job manager
[email protected]://flink@localhost:6123/user/rpc/jobmanager_4
for job 362e7f6a2a9901e5d6d2ea69d40a69d4.
2020-11-26 20:46:43,504 INFO
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] -
Registered job manager
[email protected]://flink@localhost:6123/user/rpc/jobmanager_4
for job 362e7f6a2a9901e5d6d2ea69d40a69d4.
2020-11-26 20:46:43,506 INFO org.apache.flink.runtime.jobmaster.JobMaster
[] - JobManager successfully registered at ResourceManager,
leader id: 00000000000000000000000000000000.
---- manual cancellation of the job -----
2020-11-26 20:59:12,933 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job State
machine job (362e7f6a2a9901e5d6d2ea69d40a69d4) switched from state CREATED to
CANCELLING.
2020-11-26 20:59:12,933 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: Kafka
Source (1/4) (f08af83bdc18ed82549cafcc97b747a4) switched from CREATED to
CANCELING.
```
I'm not sure if this problem is related to the scheduler or your changes,
but it looks weird that the source coordinator got closed again right away.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]