Thanks for bringing this up, Bolke.

I generally like the idea of having AS and I like where the discussions
here are going.

Just one qn I have regarding where this will fit into the wider ecosystem
is that, should we integrate this into core rather than a provider?
Meaning, it makes more sense to have this be in the core
given that this is a pretty common problem across stakeholders.

I also agree with Hussein's concern above. Maybe that can be tackled by
pinning a stable version? (Since we don't really NEED new features from
the fsspec package but rather need a stable one)

Thanks & Regards,
Amogh Desai

On Sun, Oct 22, 2023 at 5:10 PM Hussein Awala <huss...@awala.fr> wrote:

> This AIP will have a great positive impact on the project:
> - Airflow will be increasingly used as a scheduler for ML projects.
> - Simplifying the file transfer operators by replacing them with a single
> one for all the file/object storage services.
> - Implementing and managing the CI/CD pipelines for Airflow DAGs will be
> much easier once we use the new feature in the DAGs processor (out of the
> scope of this AIP but will be possible).
> - Opportunity to support a generic XCom backend based on AFS, helpful in
> sharing big files between the tasks and creating dynamic tasks from files
> (out of the scope of this AIP but will be possible).
>
> +1 (binding)
>
> My only concern is the stability of fsspec packages; I had a bad experience
> with s3fs and gcsfs in the past due to patch/minor releases with breaking
> changes or conflict with botocore for s3fs, hope release management has
> improved since.
>
> On Fri, Oct 20, 2023 at 4:33 PM Bolke de Bruin <bdbr...@gmail.com> wrote:
>
> > I have added an example for the use of the FileTransferOperator in the
> PR.
> > This is a 'port' of the local_to_s3 dag that is used elsewhere in the
> > examples. I kept the structure as per that original, but it could be
> > reduced to a two-liner (in dag-speak).
> >
> > I agree with Jens that the PR needs to settle a bit; more on the
> > implementation rather than the API imho - the API mostly comes from
> > pathlib.Path + fsspec extensions (but please shoot at it!). I hope we can
> > consider it 'settled enough' by the time this vote ends.
> >
> > Cheers
> > Bolke
> >
> > On Fri, 20 Oct 2023 at 09:43, Scheffler Jens (XC-DX/ETV5)
> > <jens.scheff...@de.bosch.com.invalid> wrote:
> >
> > > +1 (non binding) for making this AIP in general.
> > >
> > > I had a couple of comments and the rework and comments are very
> active. I
> > > assume the PR needs to settle for a moment and there still a lot of
> > > different opinions - which is fair with the given complexity. The value
> > is
> > > very high but I fear a bit that we nail down the API a bit too fast.
> But
> > it
> > > is a feature that will need to stay and we need to make it "right". So
> I
> > > propose either the PR stays for a moment to mature or we need to mark
> the
> > > feature at least for one version to be "experimental" --> to have the
> > > ability to adjust API if we learn in real life - not being "locked"
> into
> > > API v1 for years.
> > >
> > > I also would like to see examples, but maybe I need to catch-up with
> all
> > > the ongoing changes as well.
> > >
> > > THANKS for the efforts and the concepts Bolke!
> > >
> > > Mit freundlichen Grüßen / Best regards
> > >
> > > Jens Scheffler
> > >
> > > Deterministik open Loop (XC-DX/ETV5)
> > > Robert Bosch GmbH | Hessbruehlstraße 21 | 70565 Stuttgart-Vaihingen |
> > > GERMANY | http://www.bosch.com/
> > > Tel. +49 711 811-91508 | Mobil +49 160 90417410 |
> > > jens.scheff...@de.bosch.com
> > >
> > > Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
> > > Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer;
> > > Geschäftsführung: Dr. Stefan Hartung,
> > > Dr. Christian Fischer, Dr. Markus Forschner, Stefan Grosch, Dr. Markus
> > > Heyn, Dr. Tanja Rückert
> > >
> > > -----Original Message-----
> > > From: Kaxil Naik <kaxiln...@gmail.com>
> > > Sent: Freitag, 20. Oktober 2023 01:00
> > > To: dev@airflow.apache.org
> > > Subject: Re: [VOTE] AIP-58 Airflow ObjectStore
> > >
> > > I like where this is heading, so I vote *+1*.
> > >
> > > Although, I would like to see some examples of usage in DAGs
> > (before/after
> > > would be great) that will help support the following points that you
> have
> > > mentioned in the AIP <
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263430565#AIP58AirflowObjectStore(AS)-Whyisitneeded
> > > ?>
> > > :
> > >
> > >    1. Simplify DAG CI/CD
> > >    2. Streamlining pre-DAG to DAG (e.g. notebooks to DAG)
> > >    3. To allow DAG processing to be using arbitrary locations (object
> > >    storage)
> > >    4. To have a unified interface to file operations in TaskFlow and
> > >    traditional Operators
> > >
> > > and some comments:
> > >
> > >    1. You do have *lineage* listed in the image
> > >    <
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263430565#AIP58AirflowObjectStore(AS)-Whatchangedoyouproposetomake
> > > ?>,
> > >    but is it a follow-up work that you were thinking or was it part of
> > AIP
> > >    completion?
> > >    2. We would contribute the File abstraction as a follow-up to this
> AIP
> > >    too, which will help with the Dataset story too
> > >
> > >
> > > Regards,
> > > Kaxil
> > >
> > > On Thu, 19 Oct 2023 at 20:21, Bolke de Bruin <bdbr...@gmail.com>
> wrote:
> > >
> > > > I dont mind waiting for that given a reasonable timeframe. Martin
> > > > mentioned he wanted to do something at the end of the week. The vote
> > > > to this AIP runs until next Thursday anyway :-).
> > > >
> > > > And thank you :-).
> > > >
> > > > B.
> > > >
> > > > On Thu, 19 Oct 2023 at 21:11, Jarek Potiuk <ja...@potiuk.com> wrote:
> > > >
> > > > > > One less worry I hope is that aiobotocore is actually starting to
> > > > > > relax
> > > > > its botocore requirements bringing it much closer to latest
> release:
> > > > > https://gi/
> > > > > thub.com
> %2Faio-libs%2Faiobotocore%2Fpull%2F1037&data=05%7C01%7CJens.
> > > > > Scheffler%40de.bosch.com
> %7C83c763cbafcc482cf89208dbd0f73419%7C0ae51e
> > > > >
> 1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638333532372153493%7CUnknown%7C
> > > > >
> TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJX
> > > > >
> VCI6Mn0%3D%7C3000%7C%7C%7C&sdata=rlcnpX87s1UkJM0tuvNCZv%2BkuwnfR7ETa
> > > > > eiszzF7%2B%2FE%3D&reserved=0
> > > > >
> > > > > Oh yes absolutely. Great timing. And our constraints ***JUST***
> > > > > caught up automatically with aiobotocore 2.7.0 - released just 2
> days
> > > ago.
> > > > >
> > > > > We've been waiting for it for a long time and I believe the MWAA
> > > > > team had some impact there (we've beenit  discussing it a lot).
> > > > >
> > > > > And yes that will Hopefully change my +1 on AIP-58 to +1!  But only
> > > > > when s3fs relax THEIR requirement of aiobotocore ~2.5.4 they
> > currently
> > > have.
> > > > > Currently just using s3fs will bring our botocore and aiobotocore
> in
> > > > > constraints 2.5 months back.
> > > > >
> > > > > < boto3==1.28.64
> > > > > < botocore==1.31.64 -> released 16 Oct 2023
> > > > > ---
> > > > > > boto3==1.28.17
> > > > > > botocore==1.31.17 -> released 1 Aug 2023
> > > > >
> > > > > And it seems like everyone was waiting for it :
> > > > > https://gi/
> > > > > thub.com
> %2Ffsspec%2Fs3fs%2Fpull%2F809-&data=05%7C01%7CJens.Scheffler
> > > > > %40de.bosch.com
> %7C83c763cbafcc482cf89208dbd0f73419%7C0ae51e1907c84e4
> > > > >
> bbb6d648ee58410f4%7C0%7C0%7C638333532372153493%7CUnknown%7CTWFpbGZsb
> > > > >
> 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3
> > > > >
> D%7C3000%7C%7C%7C&sdata=n3hp%2BdlxFW7aKyqxcbE0vboPi61BwSvl1zi5Vd9c6a
> > > > > 0%3D&reserved=0 the s3fs change for it was
> > > > merged
> > > > > yesterday.
> > > > >
> > > > > So yes +1! I hope the s3fs release will happen before we merge
> > AIP-58.
> > > > >
> > > > > J.
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Oct 19, 2023 at 8:44 PM Bolke de Bruin <bdbr...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Thanks for thorough consideration Jarek. I follow your concerns.
> > > > > > The
> > > > idea
> > > > > > behind this AIP
> > > > > > was to reduce the cognitive load on users by staying as much
> > > > > > pythonic
> > > > as
> > > > > we
> > > > > > can and to be gentle
> > > > > > with the Airflow-isms. So I hope to limit that "yet another
> > > > > abstraction". I
> > > > > > do agree that having great
> > > > > > examples and documentation are going to be important. As a random
> > > > > > idea, this
> > > > https://medi/
> > > > um.com
> %2F%40fninsiima%2Fde-mini-series-part-two-57770ff7cdf9&data=05%7
> > > > C01%7CJens.Scheffler%40de.bosch.com
> %7C83c763cbafcc482cf89208dbd0f73419
> > > >
> %7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638333532372153493%7CUnk
> > > >
> nown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWw
> > > >
> iLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vmZlhtkBAZs7z03of%2FQ%2FMz8te8By
> > > > 2e0QTtdHDNwYPUU%3D&reserved=0
> > > > > ,
> > > > > > can now be significantly
> > > > > > simplified.
> > > > > >
> > > > > > One less worry I hope is that aiobotocore is actually starting to
> > > > > > relax
> > > > > its
> > > > > > botocore requirements
> > > > > > bringing it much closer to latest release:
> > > > > >
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2F
> > > > > > github.com
> %2Faio-libs%2Faiobotocore%2Fpull%2F1037&data=05%7C01%7CJ
> > > > > > ens.Scheffler%40de.bosch.com
> %7C83c763cbafcc482cf89208dbd0f73419%7C
> > > > > >
> 0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638333532372153493%7CUn
> > > > > >
> known%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik
> > > > > >
> 1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=rlcnpX87s1UkJM0tuvNCZv%
> > > > > > 2BkuwnfR7ETaeiszzF7%2B%2FE%3D&reserved=0
> > > > > >
> > > > > > On the requirements side there are actually not that many
> > > > > > additional dependencies being brought in.
> > > > > > Core fsspec does not bring any requirements. s3fs brings in three
> > > > > > which
> > > > > are
> > > > > > all covered by current ones.
> > > > > > adlfs brings in five, all already part of our current set. Of
> > > > > > course it does bring some complexity, but I do hope you see that
> > > > > > it is fairly limited and if it does bring in anything
> > > > > it
> > > > > > is well supported.
> > > > > >
> > > > > > The reason for creating common.io as a provider was that it was
> > > > > suggested
> > > > > > that we might want to
> > > > > > move a bit faster than core on the very simple (yet powerful ;-)
> )
> > > > > > FileTransferOperator.
> > > > > >
> > > > > > Considering this I hope you would like to make your measly +1
> into
> > > > > > a
> > > > > strong
> > > > > > +1 :-).
> > > > > >
> > > > > > Cheers
> > > > > > Bolke
> > > > > >
> > > > > >
> > > > > > On Thu, 19 Oct 2023 at 19:48, Jarek Potiuk <ja...@potiuk.com>
> > wrote:
> > > > > >
> > > > > > > Finally caught up with this one, looked through code and
> > > > discussions. I
> > > > > > am
> > > > > > > a little torn on that one but I did some more research and I
> > > > > > > think
> > > > > it's a
> > > > > > > useful abstraction.
> > > > > > >
> > > > > > > +1(binding)
> > > > > > >
> > > > > > > The big + of using fsspec is that it is already supported by
> the
> > > > > > > most important "consumers" that are likely to be used in
> > > > > > > Airflow. Pandas, Pyarrow, Iceberg. The fact that you will be
> > > > > > > able to take an S3/GCS ObjectStoragePath as an input directly
> > > > > > > and it will transparently use
> > > > > the
> > > > > > > connection of Airflow is a big plus.
> > > > > > >
> > > > > > > I would just add that we should get real-life DAG examples on
> > > > > > > how
> > > > this
> > > > > > > might simplify code of their DAGs, it's cool. I think the
> > > > > > > quality and clarity of the documentation that will come with it
> > > > > > > - clearly
> > > > > explaining
> > > > > > > some cases and examples on how DAG authors can make use of it
> to
> > > > > > > make
> > > > > > their
> > > > > > > DAG authoring "better" - is a key to success of this one. If we
> > > > > > > fail
> > > > to
> > > > > > > explain it, it might become yet another rarely used feature of
> > > > Airflow
> > > > > > >
> > > > > > > There is one worry I have - it adds "yet another abstraction"
> to
> > > > learn
> > > > > > and
> > > > > > > "yet another set of dependencies" to Airflow.  We have a new "
> > > > > common.io"
> > > > > > > provider, we have many new dependencies, we have aiobotocore as
> > > > > > > a requirement for AWS integration for example. I already looked
> > > > > > > at the
> > > > PR
> > > > > > and
> > > > > > > attempted to help with some of the dependency questions and
> > > problems.
> > > > > but
> > > > > > > we will have a few more of those to solve and some decisions to
> > > > > > > mke
> > > > > > should
> > > > > > > apache-airflow-provider-common-io be default? Should it be
> > > > > > > included
> > > > in
> > > > > > the
> > > > > > > reference image? etc. etc. This will make Airflow and its
> > > > dependencies
> > > > > > more
> > > > > > > complex than simpler. That's why I am not strong +1! just
> measly
> > > > > > > +1 - because I see how it can make airflow even "heavier" than
> it
> > > is now.
> > > > > > >
> > > > > > > J.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Oct 19, 2023 at 4:34 PM Igor Kholopov
> > > > > > <ikholo...@google.com.invalid
> > > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for incorporating the feedback!
> > > > > > > >
> > > > > > > > +1 (non-binding)
> > > > > > > >
> > > > > > > > On Thu, Oct 19, 2023 at 1:55 PM Dennis Akpenyi <
> > > > > > dennisakpe...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > +1 (non-binding)
> > > > > > > > >
> > > > > > > > > On Thu, Oct 19, 2023 at 12:24 PM Bolke de Bruin <
> > > > bdbr...@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Dear Community,
> > > > > > > > > >
> > > > > > > > > > I would like to start a vote for "AIP-58 Add Airflow
> > > > > ObjectStore".
> > > > > > > > > >
> > > > > > > > > > You can find the AIP here:
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > https://cwik/
> > > > i.apache.org
> %2Fconfluence%2Fpages%2Fviewpage.action%3FpageId%3D2634305
> > > > 65&data=05%7C01%7CJens.Scheffler%40de.bosch.com
> %7C83c763cbafcc482cf892
> > > >
> 08dbd0f73419%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638333532372
> > > >
> 153493%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJ
> > > >
> BTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=yhPY8Cyti%2FHq%2BIGb
> > > > QNQHFhl1s5rvTGiMwdI1gxl5Lu8%3D&reserved=0
> > > > > > > > > >
> > > > > > > > > > Implementing PR (most of the discussion happened here):
> > > > > > > > > >
> > https://eur03.safelinks.protection.outlook.com/?url=https%25
> > > > > > > > > >
> 3A%2F%2Fgithub.com%2Fapache%2Fairflow%2Fpull%2F34729&data=
> > > > > > > > > > 05%7C01%7CJens.Scheffler%40de.bosch.com
> %7C83c763cbafcc482c
> > > > > > > > > >
> f89208dbd0f73419%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C
> > > > > > > > > >
> 0%7C638333532372153493%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4
> > > > > > > > > >
> wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C
> > > > > > > > > >
> 3000%7C%7C%7C&sdata=RxUAV0yWdC5o0knhZcFujBQc45%2FZkPdyjYzG
> > > > > > > > > > F5Z390A%3D&reserved=0
> > > > > > > > > >
> > > > > > > > > > Discussion Thread (not much has happened here :-) ):
> > > > > > > > > > Note: the title has changed from its original.
> > > > > > > > > >
> > > > > > > > > >
> > > > https://list/
> > > > s.apache.org
> %2Fthread%2Fl3fkr0h6j2g4tlmsov14fywmj58t3mtp&data=05%7C01%
> > > > 7CJens.Scheffler%40de.bosch.com
> %7C83c763cbafcc482cf89208dbd0f73419%7C0
> > > >
> ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638333532372153493%7CUnknown
> > > >
> %7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJ
> > > >
> XVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=DK74m2t0JN8ge0YVELdQh6hXu7kHeQUujGYF
> > > > VCZ1LKc%3D&reserved=0
> > > > > > > > > >
> > > > > > > > > > This is my binding +1m the vote will last until 12:00 UTC
> > > > > > > > > > on
> > > > 26th
> > > > > > > > > October,
> > > > > > > > > > and until at least 3 binding votes have been cast.
> > > > > > > > > >
> > > > > > > > > > Please vote accordingly:
> > > > > > > > > >
> > > > > > > > > > [ ] + 1 approve
> > > > > > > > > > [ ] + 0 no opinion
> > > > > > > > > > [ ] - 1 disapprove with the reason
> > > > > > > > > >
> > > > > > > > > > Only votes from PMC members and committers are binding,
> > > > > > > > > > but
> > > > other
> > > > > > > > members
> > > > > > > > > > of the community are encouraged to check the AIP and vote
> > > > > > > > > > with "(non-binding)".
> > > > > > > > > >
> > > > > > > > > > Cheers
> > > > > > > > > > Bolke
> > > > > > > > > > --
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Bolke de Bruin
> > > > > > > > > > bdbr...@gmail.com
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > --
> > > > > > Bolke de Bruin
> > > > > > bdbr...@gmail.com
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > --
> > > > Bolke de Bruin
> > > > bdbr...@gmail.com
> > > >
> > >
> >
> >
> > --
> >
> > --
> > Bolke de Bruin
> > bdbr...@gmail.com
> >
>

Reply via email to