Hi Michael, Like Daniel already pointed out processing 1 task per worker seems very low. It depends on what resources you have allocated to Celery Worker but it should still be able to process lot of more tasks in parallel.
Increase AIRFLOW__CELERY__WORKER_CONCURRENCY=16 . Also monitor and check where the tasks are throttled at, if all your Celery Workers are busy which can be the case since you mentioned 90 tasks at a time. Now depending on how long the currently running 60 tasks take, the other 30 will still stay in Queue. And since tasks are stuck in Queued state (not the None state) and your Parallelism is already set to 1000, it most likely means you are hitting worker concurrency limits as a Task is set to Queued and sent to Executor which checks parallelism configs and sends it to Celery Worker (via Redis / RabbitMQ/etc). Regards, Kaxil On Wed, Apr 28, 2021 at 5:25 AM Daniel Standish <[email protected]> wrote: > Loading the DagBag takes around 40 seconds because of the number of tasks > > > this is suspicioius. > > it's not a given that a dag will take 40 seconds to parse due to 1000 or > 2000 tasks. > > do you perhaps have network calls in your dag that are slowing things > down? i would try to identify exactly what in the dag parse is slow and see > if you can remove the slowness. > > 60 workers set to process 1 task at a time > > > can you increase worker concurrency? i.e. num tasks per worker. default > is 16 i think, and 1 is of course the lowest number possible. i presume > you have heavy tasks that need the full resources of the worker? > > > > > >
