And just to clarify - there is a little too much redundancy in your example:
`airflow/providers/airbyte/hooks/airbyte.py` becomes `airflow
/providers/airbyte/src/airflow/providers/airbyte/hooks/airbyte.py`
The leading "airflow" is not there in the target path:
`airflow/providers/airbyte/hooks/airbyte.py` becomes
`providers/airbyte/src/airflow/providers/airbyte/hooks/airbyte.py`
And the redundancy comes from the fact that we have to somehow name the
directory where the entire provider is moved. Let me rewrite it differently:
providers/airbyte <- This is where the airbyte provider is moved
|
src/airflow/providers/airbyte/hooks/airbyte.py <- this is
actual path "inside" the provider
So there is far less redundancy than you think.
On Tue, Dec 13, 2022 at 2:40 AM Jarek Potiuk <[email protected]> wrote:
>
>> I have two immediate thoughts on it. First, can this be used as an
>> opportunity to reorganize the system tests tree into the provider's test
>> tree? Instead of having `tests/system/providers/{foo}`, maybe we can move
>> those system tests alongside the other provider tests like this:
>>
>
> Somehow what you wanted to add here was lost so I am not sure what the
> proposal is :).
>
>
>> My other thought is that I think I am missing something here as far as
>> the new organization goes. I haven't spent the time that you have spent
>> looking at how to make this work, and I also realize this is early PoC
>> discussion so maybe this isn't the end target, but the new paths are very
>> long and redundant. Is that for automation reasons? For example in the
>> image you shared:
>>
>
>
>> `airflow/providers/airbyte/hooks/airbyte.py` becomes
>> `airflow/providers/airbyte/src/airflow/providers/airbyte/hooks/airbyte.py`
>> `tests/providers/airbyte/hooks/test_airbyte.py` becomes
>> `airflow/providers/airbyte/
>> tests/airflow/providers/airbyte/hooks/test_airbyte.py.
>>
>
> For sources, I am quite sure we cannot do it - as this would end up with a
> non-standard package and we would have to make some non-standard
> manipulations with import paths and it would confuse multiple IDEs like
> crazy.
> Our providers are in "airflow.providers.<provider>" package. The "src" is
> the root of sources and we need to have "airflow"/"providers" as the
> package in.
>
> Also this choice was largely based on the official Python packaging
> tutorial:
> https://packaging.python.org/en/latest/tutorials/packaging-projects/
> (this is where I took the "src" from - it's not been very common in Python
> Projects - including Airflow - but it is now recommended in multiple places
> including official Pypa tutorial:
>
> packaging_tutorial/
> └── src/
> └── example_package_YOUR_USERNAME_HERE/
> ├── __init__.py
> └── example.py
>
> Regarding tests - yes, that could be removed. I even had it like that
> initially. And I am on the fence on this one.
>
> Putting tests at the top level has some drawbacks. One drawback is that if
> we put tests at the top, we miss "namespace". Having a "namespace" makes
> tests unique even if they are defined in different providers and are both
> put on PYTHONPATH
>
> Imagine provider_a defining "tests/hooks/test_utils.py" and provider_b
> defining "tests/utils/test_utils.py". When you import
> "tests.utils.test_utils" - which one do you import if they are all in the
> PYTHONPATH? This will become a problem when we will make all of them added
> to PYTHONPATH in your IDE - I think. I believe many IDEs (including PyCharm
> and VSCode) will not work with multiple, separate python modules and they
> will get confused when seeing multiple repeated packages with the same
> modules on PYTHONPATH.
> <package_name> gives us "namespace" and guarantees that each module will
> be different.
>
> Also keeping them in packages is pretty consistent with some guidelines I
> saw that "tests/<package>" is recommended. I would love however to hear
> more from others experiences (especially TP) as maybe there are other ways
> I have not thought about. I would also prefer "tests/test_mmm.py" rather
> than "tests/airflow/providers/providers/test_mmm.py" - but I am afraid it's
> going to be difficult due to the "repeated modules" problem.
>
>
>
>>