cansjt opened a new issue #21087:
URL: https://github.com/apache/airflow/issues/21087
### Apache Airflow version
2.2.3 (latest released)
### What happened
After upgrading Airflow to 2.2.3 (from 2.2.2) and cncf.kubernetes provider
to 3.0.1 (from 2.0.3) we started to see these errors in the logs:
```
{"asctime": "2022-01-25 08:19:39", "levelname": "ERROR", "process": 565811,
"name": "airflow.executors.kubernetes_executor.KubernetesJobWatcher",
"funcName": "run", "lineno": 111, "message": "Unknown error in
KubernetesJobWatcher. Failing", "exc_info": "Traceback (most recent call
last):\n File
\"/usr/local/lib/python3.9/site-packages/airflow/executors/kubernetes_executor.py\",
line 102, in run\n self.resource_version = self._run(\n File
\"/usr/local/lib/python3.9/site-packages/airflow/executors/kubernetes_executor.py\",
line 145, in _run\n for event in list_worker_pods():\n File
\"/usr/local/lib/python3.9/site-packages/kubernetes/watch/watch.py\", line 182,
in stream\n raise
client.rest.ApiException(\nkubernetes.client.exceptions.ApiException:
(410)\nReason: Expired: too old resource version: 655595751 (655818065)\n"}
Process KubernetesJobWatcher-6571:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/multiprocessing/process.py", line 315, in
_bootstrap
self.run()
File
"/usr/local/lib/python3.9/site-packages/airflow/executors/kubernetes_executor.py",
line 102, in run
self.resource_version = self._run(
File
"/usr/local/lib/python3.9/site-packages/airflow/executors/kubernetes_executor.py",
line 145, in _run
for event in list_worker_pods():
File "/usr/local/lib/python3.9/site-packages/kubernetes/watch/watch.py",
line 182, in stream
raise client.rest.ApiException(
kubernetes.client.exceptions.ApiException: (410)
Reason: Expired: too old resource version: 655595751 (655818065)
```
Pods are created and run to completion, but it seems the
KubernetesJobWatcher is incapable of seeing that they completed. From there
Airflow goes to a complete halt.
### What you expected to happen
No errors in the logs and the job watcher does it's job of collecting
completed jobs.
### How to reproduce
I wish I knew. Trying to downgrade the cncf.kubernetes provider to previous
versions to see if it helps.
### Operating System
k8s (Airflow images are Debian based)
### Versions of Apache Airflow Providers
apache-airflow-providers-amazon 2.6.0
apache-airflow-providers-cncf-kubernetes 3.0.1
apache-airflow-providers-ftp 2.0.1
apache-airflow-providers-http 2.0.2
apache-airflow-providers-imap 2.1.0
apache-airflow-providers-postgres 2.4.0
apache-airflow-providers-sqlite 2.0.1
### Deployment
Other
### Deployment details
The deployment is on k8s v1.19.16, made with helm3.
### Anything else
This, in the symptoms, look a lot like #17629 but happens in a different
place.
Redeploying as suggested in that issues seemed to help, but most jobs that
were supposed to run last night got stuck again. All jobs use the same pod
template, without any customization.
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]