Yes, that makes sense.

On Fri, Nov 8, 2019 at 3:22 PM Kamil Breguła <kamil.breg...@polidea.com>
wrote:

> In the case of Hadoop, it is published by Apache, so it can be in the
> apache directory.  This will mimic the grouping presented in the
> documentation.
> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html#software-operators-and-hooks
>
> On Fri, Nov 8, 2019 at 3:47 PM Kaxil Naik <kaxiln...@gmail.com> wrote:
> >
> > I think we should keep the vote open at least until mid next week to have
> > more thought and inputs on this one.
> >
> > In general, I am happy with the approach but operators/hooks and sensors
> > shouldn't be a provider. "hadoop" can be its provider and hdfs can be a
> > part of it.
> >
> > providers/
> >     google
> >          cloud
> >              operators
> >              hooks
> >              sensors
> >          gsuite
> >              operators
> >              ...
> >     amazon
> >          aws
> >              operators
> >              ...
> >     microsoft
> >          azure
> >              operators
> >              ...
> >     hadoop
> >         hdfs
> >              operators
> >              ...
> >
> > We can also define what is a "provider" so we know what to add in it in
> the
> > future. SSH/FTP/SFTP belongs to the same family group. Do we want to have
> > separate providers for each one of them ???
> >
> > Regards,
> > Kaxil
> >
> > On Fri, Nov 8, 2019 at 9:08 AM Jarek Potiuk <jarek.pot...@polidea.com>
> > wrote:
> >
> > > I really like to make everything a provider. That's a great idea !
> This way
> > > everything "backportable" will have to be in "providers" package.
> Really
> > > nice and clean separation (and less mess in "airflow"). And we will not
> > > have to have any artificial grouping (we can still group them at the
> > > documentation level).
> > >
> > > We do not need backport in name. And I think it's more of technical
> detail
> > > on naming the package which we can work out while reviewing PRs and we
> can
> > > agree final naming of the released packaged on PMC level (PMCs will
> have to
> > > vote on releasing those).
> > >
> > > The thinking is that it's intention is really to be only backported to
> 1.10
> > > - we are not going (yet) to use the packages in Airflow 2.*. so I
> thought
> > > by naming them backport we can express that intent more clearly.
> > >
> > > So let me clarify the structure of folders we are going to have if we
> > > follow it (i just added some examples) including the already agreed
> changes
> > > from AIP-21:
> > >
> > > providers/
> > >     google
> > >          cloud
> > >              operators
> > >              hooks
> > >              sensors
> > >          gsuite
> > >              operators
> > >              ...
> > >     amazon
> > >          aws
> > >              operators
> > >              ...
> > >     microsoft
> > >          azure
> > >              operators
> > >              ...
> > >     operators
> > >          sqlite.py
> > >          oracle.py
> > >          docker.py
> > >     hooks
> > >          hdfs.py
> > >          sqlite.py
> > >     sensors
> > >          http.py
> > >          sql.py
> > >
> > >
> > > J.
> > >
> > > On Fri, Nov 8, 2019 at 9:43 AM Ash Berlin-Taylor <a...@apache.org>
> wrote:
> > >
> > > > Do we need to include `-backport,`? What was the thinking behind
> that?
> > > >
> > > > I think software and protocol should be merged. I would also say
> > > > _everything_ is a provider, so airflow.providers.ssh.SSHOperator for
> > > > instance is what I would prefer
> > > >
> > > > -a
> > > >
> > > > On 8 November 2019 08:32:42 GMT, Jarek Potiuk <
> jarek.pot...@polidea.com>
> > > > wrote:
> > > > >One more day to go. I would love to see some opinions on this AIP-21
> > > > >update
> > > > >:).
> > > > >
> > > > >Executive summary:
> > > > >
> > > > >* we will be moving a number of integrations to sub-packages of
> > > > >airflow.
> > > > >* they will be backportable to 1.10.*.  There will be
> > > > >'apache-airflow-[package]-backport' pypi installable with python 3
> that
> > > > >will make Airflow 2.0 operators/hooks etc. available with 1.10*
> > > > >operators.
> > > > >* the current proposal for sub-packages is
> > > > >"protocols/software/providers/"
> > > > >(but if you think merging protocols and software makes sense -
> please
> > > > >express your opinion
> > > > >* we are not moving "fundamental" operators/hooks etc..
> > > > >* Airflow 2.0 is still going to be installed as a single package
> with
> > > > >all
> > > > >operators (so we are not yet implementing AIP-8)
> > > > >
> > > > >J.
> > > > >
> > > > >On Wed, Nov 6, 2019 at 10:07 AM Jarek Potiuk <
> jarek.pot...@polidea.com>
> > > > >wrote:
> > > > >
> > > > >> I think all this cases are valid but maybe I was not super-clear.
> > > > >It's
> > > > >> only the transfer operators that we need to decide where to put -
> not
> > > > >> hooks.
> > > > >> Usually the complexity of communication with particular storages
> is
> > > > >(or at
> > > > >> least should be) in the Hooks rather than Operators.
> > > > >>
> > > > >> Operators should be just thin wrappers over the logic in the
> hooks.
> > > > >> Hooks are going to stay where they belong - S3 Hooks in amazon,
> GCS
> > > > >Hooks
> > > > >> in google.cloud, GoogleSheet Hooks in google.gsuite.
> > > > >>
> > > > >> Since we actually have mono-repo - this will be no problem (and no
> > > > >cross
> > > > >> dependencies problem) to have S3 -> GCS operator  in google and
> use
> > > > >hooks
> > > > >> from both google/amazon.
> > > > >>
> > > > >> I hope this alleviates your concern Daniel ?
> > > > >>
> > > > >> J.
> > > > >>
> > > > >>
> > > > >>> What about GoogleSheetsToS3?  GoogleSheetsToGCS?  These you would
> > > > >put in
> > > > >>> the target, i.e. the storage?  But GoogleSheetsToSftp would be in
> > > > >google
> > > > >>> sheets operators file?  The complexity, and the shared code, are
> in
> > > > >the
> > > > >>> gsheet component -- not into the storage destination.
> > > > >>>
> > > > >>>
> > > > >>
> > > > >>
> > > > >>
> > > > >>> On Tue, Nov 5, 2019 at 5:46 PM Jarek Potiuk
> > > > ><jarek.pot...@polidea.com>
> > > > >>> wrote:
> > > > >>>
> > > > >>> > Hello Airflow Community,
> > > > >>> >
> > > > >>> > The email calls for a vote to update AIP-21 Changes in import
> > > > >paths
> > > > >>> > <
> > > > >>> >
> > > > >>>
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> > > > >>> > >
> > > > >>> > with
> > > > >>> > the changes described below. The vote will last till Saturday
> 8th
> > > > >2am
> > > > >>> CEST
> > > > >>> > (72 hours). Committers have a binding vote but everyone from
> the
> > > > >>> community
> > > > >>> > is encouraged to cast an advisory vote.
> > > > >>> >
> > > > >>> > *Summary*:
> > > > >>> >
> > > > >>> > The proposal is to update AIP-21 to move all non-core
> > > > >>> > operators/hooks/sensor (and related files) to sub-packages
> within
> > > > >>> airflow
> > > > >>> > (protocols/software/providers) or (software/providers).
> > > > >>> > I am also happy to merge protocols+software, so if you have a
> > > > >strong
> > > > >>> > opinion on it - please state it with your vote and we can
> decide
> > > > >based
> > > > >>> on
> > > > >>> > majority.
> > > > >>> >
> > > > >>> > Those packages will be separately released (schedule/process
> TBD)
> > > > >and
> > > > >>> will
> > > > >>> > be backportable to 1.10.* airflow series, so that users can
> > > > >install it
> > > > >>> and
> > > > >>> > start using new Airflow2.0 operators in their Python 3 Airflow
> > > > >1.10
> > > > >>> > environments (only Python 3.5+ is supported).
> > > > >>> >
> > > > >>> > We will proceed with migrating the providers package to already
> > > > >agreed
> > > > >>> > paths without waiting for the final vote (following current
> > > > >version of
> > > > >>> > AIP-21). Since we have working POC - we know the agreed paths
> will
> > > > >work
> > > > >>> for
> > > > >>> > us.
> > > > >>> >
> > > > >>> > *Previous discussions: *
> > > > >>> >
> > > > >>> >    -
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > > >
> > > >
> > >
> https://lists.apache.org/thread.html/b07a93c9114e3d3c55d4ee514955bac79bc012c7a00db627c6b4c55f@%3Cdev.airflow.apache.org%3E
> > > > >>> >    -
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > > >
> > > >
> > >
> https://lists.apache.org/thread.html/e25ddc546e367a4af3e594fecbd4431959bd5a89045e748e4206e7ff@%3Cdev.airflow.apache.org%3E
> > > > >>> >
> > > > >>> > *More Details*:
> > > > >>> >
> > > > >>> > 1) Information that we are going in the direction of AIP-8 but
> not
> > > > >yet
> > > > >>> > reaching it - focusing on separating out backportable packages
> > > > >>> installable
> > > > >>> > in Airflow releases 1.10.* . Airflow 2.0 will still be
> installed
> > > > >as a
> > > > >>> whole
> > > > >>> > and all the source will be kept in one repo, but we now have a
> way
> > > > >to
> > > > >>> build
> > > > >>> > backportable packages for groups of operators. POC available
> here:
> > > > >>> > https://github.com/apache/airflow/pull/6507 (based on Ash's
> > > > >>> > https://github.com/ashb/airflow-submodule-test)
> > > > >>> >
> > > > >>> > 2) We move all integrations to new packages (keeping deprecated
> > > > >import
> > > > >>> > aliases in the old places). The following split (according to
> > > > >>> "stewardship"
> > > > >>> > over the integrations):
> > > > >>> >
> > > > >>> >    - *fundamentals* - core of ariflow - they are really part of
> > > > >Apache
> > > > >>> >    Airflow. Stewards - core Airflow team. Not
> > > > >backportable/separated
> > > > >>> out.
> > > > >>> >    - *protocols* - are not owned by anyone, they are public and
> > > > >the
> > > > >>> >    implementation is fully "open". There are no particular
> > > > >stewards (no
> > > > >>> > need).
> > > > >>> >    Users of particular protocols should mainly maintain those
> and
> > > > >add
> > > > >>> > support
> > > > >>> >    for different versions of the protocols.
> > > > >>> >    - *software* - both API and software are controlled by
> someone
> > > > >>> outside
> > > > >>> >    of Airflow (commercial or open-source project), but the
> > > > >deployment of
> > > > >>> > that
> > > > >>> >    software is "owned" by the user installing Airflow. The
> > > > >"stewardship"
> > > > >>> > might
> > > > >>> >    be also the users but the controlling party (Oracle for
> > > > >example)
> > > > >>> might
> > > > >>> > be
> > > > >>> >    interested in maintaining those operators as well.
> > > > >>> >    - *providers* - API/software/deployments are fully
> controlled
> > > > >by a
> > > > >>> 3rd
> > > > >>> >    party. Here most likely "provider" will be interested in
> > > > >maintaining
> > > > >>> the
> > > > >>> >    operators (and for example like Google - provide integration
> > > > >>> guidelines
> > > > >>> >    <
> > > > >>> >
> > > > >>>
> > > > >
> > > >
> > >
> https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit?usp=drive_web&ouid=112320280470690058978
> > > > >>> > >
> > > > >>> > for
> > > > >>> >    their hooks/operators/sensors)
> > > > >>> >
> > > > >>> >
> > > > >>> > 3) Between-providers transfer operators should be kept at the
> > > > >"target"
> > > > >>> > rather than "source"
> > > > >>> > For example S3 -> GCS should be in "google" provider, but
> GCS-> S3
> > > > >>> should
> > > > >>> > be in "amazon".
> > > > >>> >
> > > > >>> > 4) One-side provider transfer operators should be kept at the
> > > > >"provider"
> > > > >>> > regardless if they are target or source.
> > > > >>> > For example GCS-> SFTP or SFTP -> GCS should be in "google"
> > > > >provider.
> > > > >>> >
> > > > >>> > 5) If in doubt we will discuss individual cases separately.
> > > > >>> >
> > > > >>> > J.
> > > > >>> >
> > > > >>> > --
> > > > >>> >
> > > > >>> > Jarek Potiuk
> > > > >>> > Polidea <https://www.polidea.com/> | Principal Software
> Engineer
> > > > >>> >
> > > > >>> > M: +48 660 796 129 <+48660796129>
> > > > >>> > [image: Polidea] <https://www.polidea.com/>
> > > > >>> >
> > > > >>>
> > > > >>
> > > > >>
> > > > >> --
> > > > >>
> > > > >> Jarek Potiuk
> > > > >> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > > >>
> > > > >> M: +48 660 796 129 <+48660796129>
> > > > >> [image: Polidea] <https://www.polidea.com/>
> > > > >>
> > > > >>
> > > > >
> > > > >--
> > > > >
> > > > >Jarek Potiuk
> > > > >Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > > >
> > > > >M: +48 660 796 129 <+48660796129>
> > > > >[image: Polidea] <https://www.polidea.com/>
> > > >
> > >
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >
> > > M: +48 660 796 129 <+48660796129>
> > > [image: Polidea] <https://www.polidea.com/>
> > >
>

Reply via email to