Thanks for the ping on https://github.com/dask/dask/issues/5803

I'm curious about how dask async features might be low-hanging fruit for
Airflow scaling
- https://distributed.dask.org/en/latest/asynchronous.html
- https://github.com/apache/airflow/pull/6984

Our company has scientific workflows and it uses dask, usually on large EC2
instances or batch jobs.  I've been getting familiar with dask from a user
perspective; I don't yet know the internals from a dev-perspective.  I
mostly use dask.delayed to scale threads/processes on a local host, with a
simple concurrent.futures API.  Dask.distributed can also run a cluster
with client connections (I previously worked with spark a bit and dask has
some good documentation on the comparisons between spark and dask).  There
are also some options for auto-scaling a dask cluster using k8s -
https://docs.dask.org/en/latest/setup/adaptive.html - so you get an
auto-scaling cluster with a lot of features for scientific computing with
the scipy-compatible stack.

I can't promise to complete anything in a timely manner, despite any
proposals to remove dask executors entirely.  I may be in-n-out of these
discussions from time-to-time, possibly silent for several weeks at a time
while I'm heads down on my full-time position.  So if Airflow 2.0 removes
them for whatever reason, I would hope it could be possible to add them
back in Airflow 2.1 if the work can be done to get it working and the
design patterns make sense and/or there is a larger user community than
anyone is yet aware of.  At present, I don't hear a clear specification for
having it work or an argument that it doesn't work at all, but I hear and
see that unit tests are disabled.  It might be possible to identify in dask
itself how to setup the test environment.  It might help to better
understand the niche that dask serves well.

The online forums and github may suffice, but if it would be possible to
find funding to sponsor a joint hack-a-thon at PyCon or something, that
would be great.  As a new contributor to Airflow, I'm still learning the
ropes and it would be good to attend an Airflow contributor workshop (maybe
someone could spin one up in the bay-area?).

Best,
Darren




On Sun, Jan 19, 2020 at 9:28 AM Jarek Potiuk <jarek.pot...@polidea.com>
wrote:

> Seems like there is an interest https://github.com/dask/dask/issues/5803
> :).
> Let's see where it gets us.
>
> J.
>
> On Sat, Jan 18, 2020 at 9:46 PM Jarek Potiuk <jarek.pot...@polidea.com>
> wrote:
>
> > Following discussion Dask's gitter, I created an issue in Dask's github :
> > https://github.com/dask/dask/issues/5803
> >
> > Let's see if we can get someone from Dask community interested.
> >
> > On Fri, Jan 17, 2020 at 10:00 PM Jarek Potiuk <jarek.pot...@polidea.com>
> > wrote:
> >
> >> Good idea :) doing that,
> >>
> >> On Fri, Jan 17, 2020 at 9:58 PM Daniel Imberman <
> >> daniel.imber...@gmail.com> wrote:
> >>
> >>> Maybe we can reach out to a company that does Dask as a service?
> >>>
> >>> via Newton Mail [
> >>>
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.32&pv=10.14.6&source=email_footer_2
> >>> ]
> >>> On Fri, Jan 17, 2020 at 9:31 AM, Jarek Potiuk <
> jarek.pot...@polidea.com>
> >>> wrote:
> >>> Yeah. I think if we do not find anyone willing to champion it (no
> matter
> >>> committer or contributor), I would be for dropping it.
> >>>
> >>> J.
> >>>
> >>> On Fri, Jan 17, 2020 at 6:07 PM Daniel Imberman <
> >>> daniel.imber...@gmail.com>
> >>> wrote:
> >>>
> >>> > I think we need to ask “who is going to champion this executor.” I
> see
> >>> > that it is being used (a bit), but am concerned if no one with
> >>> knowledge of
> >>> > this executor is willing to maintain it.
> >>> >
> >>> > I’ve personally never used Dask and the DaskExecutor isn’t super high
> >>> on
> >>> > my priority list compared to things like autoscaling, DAG
> >>> serialization,
> >>> > etc.
> >>> >
> >>> > via Newton Mail [
> >>> >
> >>>
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.32&pv=10.14.6&source=email_footer_2
> >>> > ]
> >>> > On Fri, Jan 17, 2020 at 6:07 AM, Jarek Potiuk <
> >>> jarek.pot...@polidea.com>
> >>> > wrote:
> >>> > Do we have anyone here who uses Dask Executor and would like to test
> >>> it/fix
> >>> > the tests. They are marked now as xfailed (expected to fail) and it
> >>> would
> >>> > be great to fix them.
> >>> >
> >>> > J.
> >>> >
> >>> >
> >>> > On Tue, Jan 14, 2020 at 12:18 AM Darren Weber <
> >>> dweber.consult...@gmail.com
> >>> > >
> >>> > wrote:
> >>> >
> >>> > > +1 for keeping it and fixing tests
> >>> > >
> >>> > > PS, I also noticed the skipped tests while looking at an option to
> >>> use
> >>> > the
> >>> > > async client feature; if/when I get time to get back on that and
> >>> figure
> >>> > out
> >>> > > how the test setup needs to work, I might also discover how to
> enable
> >>> > tests
> >>> > > for the non-async executor. No promises, just noting that I'm aware
> >>> of it
> >>> > > too.
> >>> > >
> >>> > > On Mon, Jan 13, 2020 at 8:06 AM Jarek Potiuk <
> >>> jarek.pot...@polidea.com>
> >>> > > wrote:
> >>> > >
> >>> > > > For now I marked the skipped tests we had (including Dask) as
> >>> > > > pytest.mark.xfail (means - expected to fail). They will be
> >>> executed and
> >>> > > > summarized as XFail tests and we will have to deal with them at
> >>> some
> >>> > > point.
> >>> > > >
> >>> > > > I think we will have to decide if we want to keep it or not, and
> >>> either
> >>> > > > remove both tests and executor or fix the tests.
> >>> > > >
> >>> > > > J.
> >>> > > >
> >>> > > > On Mon, Jan 13, 2020 at 4:40 PM Shaw, Damian P. <
> >>> > > > damian.sha...@credit-suisse.com> wrote:
> >>> > > >
> >>> > > > > FYI I used Dash instead of Local Executor when first starting
> >>> > Airflow,
> >>> > > it
> >>> > > > > was a great way to make sure the Executor and Scheduler weren’t
> >>> tied
> >>> > to
> >>> > > > > each other with no difficulty in set-up. But once I actually
> >>> started
> >>> > > > > deploying to multiple boxes I needed queue names pretty
> quickly.
> >>> So
> >>> > not
> >>> > > > > going to say it's needed but for me it was a helpful stepping
> >>> stone.
> >>> > > > >
> >>> > > > >
> >>> > > > > -----Original Message-----
> >>> > > > > From: Ash Berlin-Taylor <a...@apache.org>
> >>> > > > > Sent: Sunday, January 12, 2020 17:38
> >>> > > > > To: dev@airflow.apache.org
> >>> > > > > Cc: dev@airflow.apache.org
> >>> > > > > Subject: Re: Remove Dask Executor in Airflow 2.0 ?
> >>> > > > >
> >>> > > > > It hasn't been discussed before, but unlike the Mesos one this
> >>> one
> >>> > was
> >>> > > > > seen a (tiny) bit of activity in 1.10 so at least one person is
> >>> using
> >>> > > it
> >>> > > > > https://github.com/apache/airflow/pull/5273
> >>> > > > >
> >>> > > > > On Jan 12 2020, at 9:05 pm, Jarek Potiuk <
> >>> jarek.pot...@polidea.com>
> >>> > > > wrote:
> >>> > > > > > I am finishing the PR on separating integrations and
> improving
> >>> our
> >>> > CI
> >>> > > > > > footprint (https://github.com/apache/airflow/pull/7091) but
> >>> during
> >>> > > > > > this change I have found that we have - apparently -
> >>> dysfunctional
> >>> > > > > > DaskExecutor in Airflow 2.0.
> >>> > > > > >
> >>> > > > > > There is a "test_dask_executor.py" for which all tests are
> >>> skipped.
> >>> > > > > > And they fail when I try to run the tests. I tried to look
> for
> >>> any
> >>> > > > > > reference in devlist archives but I couldn't find anything
> >>> about
> >>> > it.
> >>> > > > > >
> >>> > > > > > Can someone shed some light on this? Should we remove Dask
> >>> executor
> >>> > > > > > completely from Airflow 2.0 ? Or should we fix the
> >>> tests/executor ?
> >>> > > > > > Has it been discussed ?
> >>> > > > > >
> >>> > > > > > J.
> >>> > > > > >
> >>> > > > > > --
> >>> > > > > > Jarek Potiuk
> >>> > > > > > Polidea <https://www.polidea.com/> | Principal Software
> >>> Engineer
> >>> > > > > >
> >>> > > > > > M: +48 660 796 129 <+48660796129>
> >>> > > > > > [image: Polidea] <https://www.polidea.com/>
> >>> > > > > >
> >>> > > > >
> >>> > > > >
> >>> > > > >
> >>> > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> ===============================================================================
> >>> > > > >
> >>> > > > > Please access the attached hyperlink for an important
> electronic
> >>> > > > > communications disclaimer:
> >>> > > > > http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> ===============================================================================
> >>> > > > >
> >>> > > > >
> >>> > > >
> >>> > > >
> >>> > > > --
> >>> > > >
> >>> > > > Jarek Potiuk
> >>> > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>> > > >
> >>> > > > M: +48 660 796 129 <+48660796129>
> >>> > > > [image: Polidea] <https://www.polidea.com/>
> >>> > > >
> >>> > >
> >>> > >
> >>> > > --
> >>> > > Darren L. Weber, Ph.D.
> >>> > > http://psdlw.users.sourceforge.net/
> >>> > > http://psdlw.users.sourceforge.net/wordpress/
> >>> > >
> >>> >
> >>> >
> >>> > --
> >>> >
> >>> > Jarek Potiuk
> >>> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>> >
> >>> > M: +48 660 796 129 <+48660796129>
> >>> > [image: Polidea] <https://www.polidea.com/>
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> Jarek Potiuk
> >>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>>
> >>> M: +48 660 796 129 <+48660796129>
> >>> [image: Polidea] <https://www.polidea.com/>
> >>
> >>
> >>
> >> --
> >>
> >> Jarek Potiuk
> >> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>
> >> M: +48 660 796 129 <+48660796129>
> >> [image: Polidea] <https://www.polidea.com/>
> >>
> >>
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >
> >
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>


-- 
Darren L. Weber, Ph.D.
http://psdlw.users.sourceforge.net/
http://psdlw.users.sourceforge.net/wordpress/

Reply via email to