I'll have to read your proposal in detail (sorry, no time right now!), but I'm 
broadly in favour of this approach, and I think keeping them _in_ the same repo 
is the best plan -- that makes writing and  testing cross-cutting changes  
easier.

-a

> On 28 Oct 2019, at 12:14, Tomasz Urbaszek <tomasz.urbas...@polidea.com> wrote:
> 
> I think utilizing namespaces should reduce a lot of problems raised by
> using separate repos (who will manage it? how to release? where should be
> the repo?).
> 
> Bests,
> Tomek
> 
> On Sun, Oct 27, 2019 at 11:54 AM Jarek Potiuk <jarek.pot...@polidea.com>
> wrote:
> 
>> Thanks Bas for comments! Let me share my thoughts below.
>> 
>> On Sun, Oct 27, 2019 at 9:23 AM Bas Harenslak <
>> basharens...@godatadriven.com>
>> wrote:
>> 
>>> Hi Jarek, I definitely see a future in creating separate installable
>>> packages for various operators/hooks/etc (as in AIP-8). This would IMO
>>> strip the “core” Airflow to only what’s needed and result in a small
>>> package without a ton of dependencies (and make it more maintainable,
>>> shorter tests, etc etc etc). Not exactly sure though what you’re
>> proposing
>>> in your e-mail, is it a new AIP for an intermediate step towards AIP-8?
>>> 
>> 
>> It's a new AIP I am proposing.  For now it's only for backporting the new
>> 2.0 import paths to 1.10.* series.
>> 
>> It's more of "incremental going in direction of AIP-8 and learning some
>> difficulties involved" than implementing AIP-8 fully. We are taking
>> advantage of changes in import paths from AIP-21 which make it possible to
>> have both old and new (optional) operators available in 1.10.* series of
>> Airflow. I think there is a lot more to do for full implementation of
>> AIP-8: decisions how to maintain, install those operator groups separately,
>> stewardship model/organisation for the separate groups, how to manage
>> cross-dependencies, procedures for releasing the packages etc.
>> 
>> I think about this new AIP also as a learning effort - we would learn more
>> how separate packaging works and then we can follow up with AIP-8 full
>> implementation for "modular" Airflow. Then AIP-8 could be implemented in
>> Airflow 2.1 for example - or 3.0 if we start following semantic versioning
>> - based on those learnings. It's a bit of good example of having cake and
>> eating it too. We can try out modularity in 1.10.* while cutting the scope
>> of 2.0 and not implementing full management/release procedure for AIP-8
>> yet.
>> 
>> 
>>> Thinking about this, I think there are still a few grey areas (which
>> would
>>> be good to discuss in a new AIP, or continue on AIP-8):
>>> 
>>>  *   In your email you only speak only about the 3 big cloud providers
>>> (btw I made a PR for migrating all AWS components ->
>>> https://github.com/apache/airflow/pull/6439). Is there a plan for
>>> splitting other components than Google/AWS/Azure?
>>> 
>> 
>> We could add more groups as part of this new AIP indeed (as an extension to
>> AIP-21 and pre-requisite to AIP-8). We already see how moving/deprecation
>> works for the providers package - it works for GCP/Google rather nicely.
>> But there is nothing to prevent us from extending it to cover other groups
>> of operators/hooks. If you look at the current structure of documentation
>> done by Kamil, we can follow the structure there and move the
>> operators/hooks accordingly (
>> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html):
>> 
>>      Fundamentals, ASF: Apache Software Foundation, Azure: Microsoft
>> Azure, AWS: Amazon Web Services, GCP: Google Cloud Platform, Service
>> integrations, Software integrations, Protocol integrations.
>> 
>> I am happy to include that in the AIP - if others agree it's a good idea.
>> Out of those groups -  I think only Fundamentals should not be back-ported.
>> Others should be rather easy to port (if we decide to). We already have
>> quite a lot of those in the new GCP operators for 2.0. So starting with
>> GCP/Google group is a good idea. Also following with Cloud Providers first
>> is a good thing. For example we have now support from Google Composer team
>> to do this separation for GCP (and we learn from it) and then we can claim
>> the stewardship in our team for releasing the python 3/ Airflow
>> 1.10-compatible "airflow-google" packages. Possibly other Cloud
>> Providers/teams might follow this (if they see the value in it) and there
>> could be different stewards for those. And then we can do other groups if
>> we decide to. I think this way we can learn whether AIP-8 is manageable and
>> what real problems we are going to face.
>> 
>>  *   Each “plugin” e.g. GCP would be a separate repo, should we create
>>> some sort of blueprint for such packages?
>>> 
>> 
>> I think we do not need separate repos (at all) but in this new AIP we can
>> test it before we decide to go for AIP-8. IMHO - monorepo approach will
>> work here rather nicely. We could use python-3 native namespaces
>> <https://packaging.python.org/guides/packaging-namespace-packages/> for
>> the
>> sub-packages when we go full AIP-8. For now we could simply package the new
>> operators in separate pip package for Python 3 version 1.10.* series only.
>> We only need to test if it works well with another package providing
>> 'airflow.providers.*' after apache-airflow is installed (providing
>> 'airflow' package). But I think we can make it work. I don't think we
>> really need to split the repos, namespaces will work just fine and has
>> easier management of cross-repository dependencies (but we can learn
>> otherwise). For sure we will not need it for the new proposed AIP of
>> backporting groups to 1.10 and we can defer that decision to AIP-8
>> implementation time.
>> 
>> 
>>>  *   In which Airflow version do we start raising deprecation warnings
>>> and in which version would we remove the original?
>>> 
>> 
>> I think we should do what we did in GCP case already. Those old "imports"
>> for operators can be made as deprecated in Airflow 2.0 (and removed in 2.1
>> or 3.0 if we start following semantic versioning). We can however do it
>> before in 1.10.7 or 1.10.8 if we release those (without removing the old
>> operators yet - just raise deprecation warnings and inform that for python3
>> the new "airflow-google", "airflow-aws" etc. packages can be installed and
>> users can switch to it).
>> 
>> J.
>> 
>> 
>>> 
>>> Cheers,
>>> Bas
>>> 
>>> On 27 Oct 2019, at 08:33, Jarek Potiuk <jarek.pot...@polidea.com<mailto:
>>> jarek.pot...@polidea.com>> wrote:
>>> 
>>> Hello - any comments on that? I am happy to make it into an AIP :)?
>>> 
>>> On Sun, Oct 13, 2019 at 5:53 PM Jarek Potiuk <jarek.pot...@polidea.com
>>> <mailto:jarek.pot...@polidea.com>>
>>> wrote:
>>> 
>>> *Motivation*
>>> 
>>> I think we really should start thinking about making it easier to migrate
>>> to 2.0 for our users. After implementing some recent changes related to
>>> AIP-21-
>>> Changes in import paths
>>> <
>>> 
>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
>>> 
>>> I
>>> think I have an idea that might help with it.
>>> 
>>> *Proposal*
>>> 
>>> We could package some of the new and improved 2.0 operators (moved to
>>> "providers" package) and let them be used in Python 3 environment of
>>> airflow 1.10.x.
>>> 
>>> This can be done case-by-case per "cloud provider". It should not be
>>> obligatory, should be largely driven by each provider. It's not yet full
>>> AIP-8
>>> Split Hooks/Operators into separate packages
>>> <
>>> 
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303
>>> .
>>> It's
>>> merely backporting of some operators/hooks to get it work in 1.10. But by
>>> doing it we might try out the concept of splitting, learn about
>> maintenance
>>> problems and maybe implement full *AIP-8 *approach in 2.1 consistently
>>> across the board.
>>> 
>>> *Context*
>>> 
>>> Part of the AIP-21 was to move import paths for Cloud providers to
>>> separate providers/<PROVIDER> package. An example for that (the first
>>> provider we already almost migrated) was providers/google package
>> (further
>>> divided into gcp/gsuite etc).
>>> 
>>> We've done a massive migration of all the Google-related operators,
>>> created a few missing ones and retrofitted some old operators to follow
>> GCP
>>> best practices and fixing a number of problems - also implementing
>> Python3
>>> and Pylint compatibility. Some of these operators/hooks are not backwards
>>> compatible. Those that are compatible are still available via the old
>>> imports with deprecation warning.
>>> 
>>> We've added missing tests (including system tests) and missing features -
>>> improving some of the Google operators - giving the users more
>> capabilities
>>> and fixing some issues. Those operators should pretty much "just work" in
>>> Airflow 1.10.x (any recent version) for Python 3. We should be able to
>>> release a separate pip-installable package for those operators that users
>>> should be able to install in Airflow 1.10.x.
>>> 
>>> Any user will be able to install this separate package in their Airflow
>>> 1.10.x installation and start using those new "provider" operators in
>>> parallel to the old 1.10.x operators. Other providers ("microsoft",
>>> "amazon") might follow the same approach if they want. We could even at
>>> some point decide to move some of the core operators in similar fashion
>>> (for example following the structure proposed in the latest
>> documentation:
>>> fundamentals / software / etc.
>>> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html)
>>> 
>>> *Pros and cons*
>>> 
>>> There are a number of pros:
>>> 
>>>  - Users will have an easier migration path if they are deeply vested
>>>  into 1.10.* version
>>>  - It's possible to migrate in stages for people who are also vested in
>>>  py2: *py2 (1.10) -> py3 (1.10) -> py3 + new operators (1.10) -> py3 +
>>>  2.0*
>>>  - Moving to new operators in py3 + new operators can be done
>>>  gradually. Old operators will continue to work while new can be used
>> more
>>>  and more
>>>  - People will get incentivised to migrate to python 3 before 2.0 is
>>>  out (by using new operators)
>>>  - Each provider "package" can have independent release schedule - and
>>>  add functionality in already released Airflow versions.
>>>  - We do not take out any functionality from the users - we just add
>>>  more options
>>>  - The releases can be - similarly as main airflow releases - voted
>>>  separately by PMC after "stewards" of the package (per provider)
>> perform
>>>  round of testing on 1.10.* versions.
>>>  - Users will start migrating to new operators earlier and have
>>>  smoother switch to 2.0 later
>>>  - The latest improved operators will start
>>> 
>>> There are three cons I could think of:
>>> 
>>>  - There will be quite a lot of duplication between old and new
>>>  operators (they will co-exist in 1.10). That might lead to confusion of
>>>  users and problems with cooperation between different operators/hooks
>>>  - Having new operators in 1.10 python 3 might keep people from
>>>  migrating to 2.0
>>>  - It will require some maintenance and separate release overhead.
>>> 
>>> I already spoke to Composer team @Google and they are very positive about
>>> this. I also spoke to Ash and seems it might also be OK for Astronomer
>>> team. We have Google's backing and support, and we can provide
>> maintenance
>>> and support for those packages - being an example for other providers how
>>> they can do it.
>>> 
>>> Let me know what you think - and whether I should make it into an
>> official
>>> AIP maybe?
>>> 
>>> J.
>>> 
>>> 
>>> 
>>> --
>>> 
>>> Jarek Potiuk
>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>> 
>>> M: +48 660 796 129 <+48660796129>
>>> [image: Polidea] <https://www.polidea.com/>
>>> 
>>> 
>>> 
>>> --
>>> 
>>> Jarek Potiuk
>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>> 
>>> M: +48 660 796 129 <+48660796129>
>>> [image: Polidea] <https://www.polidea.com/>
>>> 
>>> 
>> 
>> --
>> 
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>> 
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>> 
> 
> 
> -- 
> 
> Tomasz Urbaszek
> Polidea <https://www.polidea.com/> | Junior Software Engineer
> 
> M: +48 505 628 493 <+48505628493>
> E: tomasz.urbas...@polidea.com <tomasz.urbasz...@polidea.com>
> 
> Unique Tech
> Check out our projects! <https://www.polidea.com/our-work>

Reply via email to