rmetzger edited a comment on pull request #14239:
URL: https://github.com/apache/flink/pull/14239#issuecomment-734461109


   Thanks a lot for addressing the issues I've reported. While testing this PR, 
I noticed that the job got stuck while submitting it:
   
   ```
   2020-11-26 20:46:43,453 INFO  
org.apache.flink.runtime.dispatcher.StandaloneDispatcher     [] - Received 
JobGraph submission 362e7f6a2a9901e5d6d2ea69d40a69d4 (State machine job).
   2020-11-26 20:46:43,453 INFO  
org.apache.flink.runtime.dispatcher.StandaloneDispatcher     [] - Submitting 
job 362e7f6a2a9901e5d6d2ea69d40a69d4 (State machine job).
   2020-11-26 20:46:43,455 INFO  
org.apache.flink.runtime.rpc.akka.AkkaRpcService             [] - Starting RPC 
endpoint for org.apache.flink.runtime.jobmaster.JobMaster at 
akka://flink/user/rpc/jobmanager_4 .
   2020-11-26 20:46:43,455 INFO  org.apache.flink.runtime.jobmaster.JobMaster   
              [] - Initializing job State machine job 
(362e7f6a2a9901e5d6d2ea69d40a69d4).
   2020-11-26 20:46:43,456 INFO  org.apache.flink.runtime.jobmaster.JobMaster   
              [] - Using restart back off time strategy 
FixedDelayRestartBackoffTimeStrategy(maxNumberRestartAttempts=2147483647, 
backoffTimeMS=1000) for State machine job (362e7f6a2a9901e5d6d2ea69d40a69d4).
   2020-11-26 20:46:43,457 INFO  org.apache.flink.runtime.jobmaster.JobMaster   
              [] - Running initialization on master for job State machine job 
(362e7f6a2a9901e5d6d2ea69d40a69d4).
   2020-11-26 20:46:43,457 INFO  org.apache.flink.runtime.jobmaster.JobMaster   
              [] - Successfully ran initialization on master in 0 ms.
   2020-11-26 20:46:43,492 INFO  
org.apache.flink.runtime.scheduler.adapter.DefaultExecutionTopology [] - Built 
1 pipelined regions in 0 ms
   2020-11-26 20:46:43,492 INFO  org.apache.flink.runtime.jobmaster.JobMaster   
              [] - No state backend has been configured, using default (Memory 
/ JobManager) MemoryStateBackend (data in heap memory / checkpoints to 
JobManager) (checkpoints: 'null', savepoints: 'null', asynchronous: TRUE, 
maxStateSize: 5242880)
   2020-11-26 20:46:43,492 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - No checkpoint 
found during restore.
   2020-11-26 20:46:43,493 INFO  org.apache.flink.runtime.jobmaster.JobMaster   
              [] - Using failover strategy 
org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy@530fdf01
 for State machine job (362e7f6a2a9901e5d6d2ea69d40a69d4).
   2020-11-26 20:46:43,493 INFO  
org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl      [] - JobManager 
runner for job State machine job (362e7f6a2a9901e5d6d2ea69d40a69d4) was granted 
leadership with session id 00000000-0000-0000-0000-000000000000 at 
akka.tcp://flink@localhost:6123/user/rpc/jobmanager_4.
   2020-11-26 20:46:43,493 INFO  org.apache.flink.runtime.jobmaster.JobMaster   
              [] - Starting execution of job State machine job 
(362e7f6a2a9901e5d6d2ea69d40a69d4) under job master id 
00000000000000000000000000000000.
   2020-11-26 20:46:43,493 INFO  
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Starting 
split enumerator for source Source: Kafka Source.
   2020-11-26 20:46:43,501 INFO  
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Closing 
SourceCoordinator for source Source: Kafka Source.
   2020-11-26 20:46:43,502 INFO  
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Source 
coordinator for source Source: Kafka Source closed.
   2020-11-26 20:46:43,502 INFO  org.apache.flink.runtime.jobmaster.JobMaster   
              [] - Connecting to ResourceManager 
akka.tcp://flink@localhost:6123/user/rpc/resourcemanager_*(00000000000000000000000000000000)
   2020-11-26 20:46:43,503 INFO  org.apache.flink.runtime.jobmaster.JobMaster   
              [] - Resolved ResourceManager address, beginning registration
   2020-11-26 20:46:43,503 INFO  
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
Registering job manager 
00000000000000000000000000000...@akka.tcp://flink@localhost:6123/user/rpc/jobmanager_4
 for job 362e7f6a2a9901e5d6d2ea69d40a69d4.
   2020-11-26 20:46:43,504 INFO  
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
Registered job manager 
00000000000000000000000000000...@akka.tcp://flink@localhost:6123/user/rpc/jobmanager_4
 for job 362e7f6a2a9901e5d6d2ea69d40a69d4.
   2020-11-26 20:46:43,506 INFO  org.apache.flink.runtime.jobmaster.JobMaster   
              [] - JobManager successfully registered at ResourceManager, 
leader id: 00000000000000000000000000000000.
   ---- manual cancellation of the job -----
   2020-11-26 20:59:12,933 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job State 
machine job (362e7f6a2a9901e5d6d2ea69d40a69d4) switched from state CREATED to 
CANCELLING.
   2020-11-26 20:59:12,933 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: Kafka 
Source (1/4) (f08af83bdc18ed82549cafcc97b747a4) switched from CREATED to 
CANCELING.
   ```
   
   I'm not sure if this problem is related to the scheduler or your changes, 
but it looks weird that the source coordinator got closed again right away.
   The problem is reproducible. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to