Thanks Jarek, for your thoughts, yes opendalfs was one of the packages
available but not that maintained well and it has
less integrations between opendal file systems. Right now, it supports only
S3, fs, and in-memory file systems.
I agree that it may be more suitable for a TaskFlow-based approach.

yeah common.io also a good choice in OpenDALprovider as its all file IO.

If the user prefers to use the TaskFlow pattern with OpenDAL, there's still
an option available now. They can use the OpenDAL
hook provided as part of the current provider implementation [1]. Which
gives access to the opendal operator. This allows them to
access all the file IO methods in the OpenDAL operator,  and they can
perform operations based on their needs.

Another one i would like to mention OpenDAL has a layers mechanism, a layer
can be RetryLayer, LoggingLayer etc [2]
it gives the ability to do retries if anything failed to be part of file IO
operation with RetryLayer. But at present for python binding [3]
I could see only Retry, Concurrent layers available. might be others will
come soon or later.

For the first iteration I have only added the normal operator
implementations, Layers can be added too later.


[1]
https://github.com/apache/airflow/pull/50728/files#diff-723d44bed4f9dc9c5a06560ce41471caa3b3870abc3d7b4d0e7a4437b04ab4eaR116
[2] https://github.com/apache/opendal/tree/main/core/src/layers
[3]
https://github.com/apache/opendal/blob/main/bindings/python/python/opendal/layers.pyi#L24

Pavan.

On Tue, May 20, 2025 at 3:34 AM Jarek Potiuk <ja...@potiuk.com> wrote:

> Agree it's a mature project (and being in ASF we can count on community and
> active maintenance).
>
> While there is overlap, there is indeed a much broader scope of OpenDAL
> than fsspec. Interestingly enough I've found
> https://github.com/fsspec/opendalfs - an implementation of the fsspec on
> OpenDal - it does not seem too active/maintained, but It actually **might**
> be possible to implement OpenDAL provider that would also have `common.io`
> binding (which would be very similar to what we have for s3, google etc. -
> > each of those providers has its own APIs to access storage, and we have
> bindings to provide the "fsspec" one.
>
> That could remove (possibly) some confusion and questions about OpenDAL
> being somewhat "similar" to fsspec - but somewhat different. Confusingly -
> their basic entity is called ... Operator ;) ... but it's a bit different
> philosophy  - and with the proposed API it serves a bit different purpose.
> While FSSPec is good to be used as part of task flow code, OpenDal
> interface proposed by Pavan is more of the"classic "Operator" use.
>
> I'd say there are a bit different users of both.
>
> J.
>
>
>
> On Mon, May 19, 2025 at 6:26 PM Pavankumar Gopidesu <
> gopidesupa...@gmail.com>
> wrote:
>
> > Yes kaxil, agree on some overlap and we should provide the right message
> to
> > users to help them choose what to use when.
> >
> > OpenDAL comes with predefined backends(ex: s3, gcs, fs, ghac, memcached
> > etc;), all these backends work on the interface file like operations, so
> > from users POV they have to configure which backend to use and connection
> > string and what action they want to perform read/write etc;, then rest
> > works out of the box.
> >
> > When using databases as key-value storage, OpenDAL supports reading and
> > writing data seamlessly. However, to align with OpenDAL’s design, the key
> > column must follow a file path-like structure*, *essentially a string
> that
> > looks like a file path (e.g., documents/file.txt). The value column
> should
> > store the corresponding binary data, typically using a type like BYTEA in
> > PostgreSQL. This setup allows OpenDAL to perform file-style operations
> > (read, write, list, delete) on database storage just as it would with
> > traditional file systems or object stores.
> >
> > Some more context about OpenDAL db support[1]
> >
> > @shahar Epstein,  Yes i could see with OpenDAL interface the reading and
> > writing part is simpler it can be a read from http and write to s3 etc;
> >
> > [1] https://xuanwo.io/2023/03-opendal-database-support/
> >
> > Pavan
> >
> > On Mon, May 19, 2025 at 4:59 PM Kaxil Naik <kaxiln...@gmail.com> wrote:
> >
> > > Good idea. However, there is some overlap with ObjectStorage too.
> OpenDAL
> > > looks to be a superset of ObjectStorage for sure, but we will need to
> > > figure out the messaging to users from POV of what they should be
> using.
> > >
> > > On Mon, 19 May 2025 at 13:31, Pavankumar Gopidesu <
> > gopidesupa...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Yes Vikram we can discuss.
> > > >
> > > > Pavan,
> > > >
> > > > On Mon, May 19, 2025 at 12:55 AM Vikram Koka
> > > <vik...@astronomer.io.invalid
> > > > >
> > > > wrote:
> > > >
> > > > > Pavan,
> > > > >
> > > > > From a concept perspective and strategic direction, I am in 100%
> > > > agreement.
> > > > > I have also been thinking about this and submitted a talk for the
> > > Airflow
> > > > > Summit on this topic.
> > > > >
> > > > > I am unsure of this particular package, but will look into it to
> > > > understand
> > > > > more.
> > > > >
> > > > > The place where I do have concerns with the approach is on the
> > > interface
> > > > > and its integration with other providers. We have tried a very
> > similar
> > > > > approach in the past with the Universal Transfer Operator and it
> had
> > > > > limited resonance / adoption. I have also seen other similar
> > approaches
> > > > > proposed since and with limited uptick.
> > > > >
> > > > > This is definitely a key topic I would like to collaborate on, so
> > let’s
> > > > > sync offline and get back to the dev list.
> > > > >
> > > > > Best regards,
> > > > > Vikram
> > > > >
> > > > > On Sun, May 18, 2025 at 11:35 AM Shahar Epstein <sha...@apache.org
> >
> > > > wrote:
> > > > >
> > > > > > +1 from me - the project seems well-maintained, and we should
> > > > definitely
> > > > > > collaborate and open the "providers" door to Apache projects that
> > > could
> > > > > fit
> > > > > > well in Apache Airflow like this one.
> > > > > > If it proves itself well in the future, maybe we could think of
> > > > > deprecating
> > > > > > the "transform" operators in favor of this one, or at least make
> > > their
> > > > > > execution method to utilize it.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to