Re: Time between tasks - 4 minutes

Daniel Standish Wed, 28 Apr 2021 12:03:55 -0700

> Regarding worker concurrency, according to the data scientist, the task
that is kicked off is very intensive and spawns heavy duty C++ processes
limiting our worker setting.


This sounds like a good use case for kubernetes pod operator or kubernetes
executor or ECS operator (if you are on aws) (there are probably analogous
options for azure and gcp)

there's also solutions like aws batch.

note, I think it's possible to use kubernetes pod operator even when you
are not running *airflow* on k8s, and if i have that right it's something
you could test out without a ton of risk (assuming you're not already
running airflow on k8s)

i personally like doing real work on ariflow celery workers (some like to
use them as only lightweight processes / scheduling) but when you get down
to 1 task per worker then it really starts to make sense to offload the
processing elsewhere.

k8s pod operator will let you run your arbitrary docker containers on an
k8s cluster.  and ECS operator would do the same but on ECS.  and k8s
executor runs each task in its own pod, so you can get task isolation and
independent resources without refactoring your job as a distinct docker
container.

annnd to round it out there is CeleryKubernetesExecutor with which you can
have a "mostly celery" cluster but set tasks to run with k8s executor by
sending them to a special kubernetes queue.

Re: Time between tasks - 4 minutes

Reply via email to