george-zubrienko commented on issue #19038:
URL: https://github.com/apache/airflow/issues/19038#issuecomment-987970167


   Seeing this issue a lot with 3 schedulers on 2.2.2, KubernetesExecutor on 
k8s 1.21. Code is PythonOperator calling a webservice. We do get some logs logs 
on startup though - this is right before job code starts:
   
   ``` 
   [2021-12-07, 13:32:00 UTC] {taskinstance.py:1262} INFO - Executing 
<Task(PythonOperator): ...> on 2021-12-07 06:00:00+00:00
   [2021-12-07, 13:32:00 UTC] {standard_task_runner.py:52} INFO - Started 
process 13 to run task
   [2021-12-07, 13:32:00 UTC] {standard_task_runner.py:76} INFO - Running: 
['airflow', 'tasks', 'run', ....]
   [2021-12-07, 13:32:00 UTC] {standard_task_runner.py:77} INFO - Job 129472: 
Subtask ...
   [2021-12-07, 13:32:05 UTC] {local_task_job.py:211} WARNING - State of this 
instance has been externally set to queued. Terminating instance.
   [2021-12-07, 13:32:05 UTC] {process_utils.py:100} INFO - Sending 
Signals.SIGTERM to GPID 13
   [2021-12-07, 13:32:07 UTC] {process_utils.py:66} INFO - Process 
psutil.Process(pid=13, status='terminated', exitcode=1, started='13:32:00') 
(13) terminated with exit code 1
   ```
   Next try for this task started running, but then:
   
   ```
   [2021-12-07, 13:32:04 UTC] {chained.py:84} INFO - DefaultAzureCredential 
acquired a token from EnvironmentCredential
   [2021-12-07, 13:32:06 UTC] {taskinstance.py:1411} ERROR - Received SIGTERM. 
Terminating subprocesses.
   [2021-12-07, 13:32:06 UTC] {taskinstance.py:1703} ERROR - Task failed with 
exception
   ```
   
   And then retry 3 goes through finally. Only seeing this issue on pipelines 
with high number of parallel tasks - in our case, 3 task pools 48 + 48 + 90 
total capacity. Also for >1 scheduler, scheduler pods sometimes print this to 
logs:
   ```
   sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock 
detected
   DETAIL:  Process 1368 waits for ShareLock on transaction 30815670; blocked 
by process 1160.
   Process 1160 waits for ShareLock on transaction 30815664; blocked by process 
1368.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (13513,3) in relation "task_instance"
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to