Some context: I’ve been looking into putting the Connection URI to more use, 
most importantly to consolidate it with OpenLineage’s URI format.

Skipping the details, one of the conclusions I had is get_uri on Connection is 
not actually that useful, due to how different libraries (SQLAlchemy, Airflow 
Object Store, OpenLineage, etc.) have drastically different URI formats, and it 
is not practical to implement get_uri in Connection to be compatible with all 
of them. The correct layer to generate a connection URI is in hooks instead, 
and the format does not need to be visible to the user, but internal between 
the hook and whatever library it talks to, since the URI semantics are too 
specific to a library to be generally useful as a serde format.

Therefore I think deprecating the URI Connection representation would not be 
harmful, except maybe the URI can be more useful in simplistic cases.


> On 27 Nov 2023, at 12:21, Amogh Desai <amoghdesai....@gmail.com> wrote:
> 
> A little late to the party here but convinced by what Daniel had to say.
> 
> I am ok with the idea of adding this into the deprecation bucket in the
> coming time.
> 
> Implementing the get_uri in the hook seems quite reasonable to me
> on the grounds that the Connection class' get_uri is meant to be general,
> supporting the use case of serializing the Airflow connection on basis of
> host, port, schema, password, type etc.
> 
> Thanks & Regards,
> Amogh Desai
> 
> On Mon, Nov 20, 2023 at 9:56 PM Daniel Standish
> <daniel.stand...@astronomer.io.invalid> wrote:
> 
>>> 
>>> I still think different people have different expectations of the Airflow
>>> URI format. My expectation is that this is only a SerDe mechanism around
>>> Airflow Connection. If we have a look at the codebase of Airflow we found
>>> that this is not always true and we try to use it as SA URI.
>> 
>> 
>> Yes, this is in my view exactly the problem.  I believe we run into trouble
>> only when we falsely think that we should be able to define an Airflow
>> connection using, say, a jdbc connection string. It's not true in general.
>> 
>> Observe that DbApiHook has a get_uri
>> <
>> https://github.com/apache/airflow/blob/064fc2b7751a44e37ccce97609cff7c496098e56/airflow/providers/common/sql/hooks/sql.py#L188-L196
>>> 
>> method.
>> 
>> This method is used to produce a connection string valid for the service
>> you're connecting to, and importantly it may be different from the
>> Connection.get_uri.
>> 
>> The primary responsibility of Airflow connection URI is to serialize the
>> airflow connection object which has type, login, pass, host, schema port,
>> extra and that's it.  For simple cases it can serve both purposes but it's
>> not true in general and folks need to understand that.
>> 
>> I think part of the problem may simply be that `uri` is an argument of the
>> Connection object. It means not the connection string but the airflow conn
>> URI serialization.  Would probably be a good candidate for deprecation
>> on the basis that it is confusing.
>> 
>> Fundamentally, as a hook designer, if you need a conn string that is a URI,
>> you must implement get_uri on the hook in the right way to be able to
>> use the attributes available to you in Connection.  Connection.get_uri will
>> handle creating the Airflow Conn URI serialization and that's a separate
>> thing.
>> 
>> The connection object has no clue how to generate a valid connection string
>> for your service, but the hook can have this understanding.
>> 
>> There are many examples in the codebase where users implement get_uri on
>> the hook and it works just fine.  A hook author could write it so you can
>> optionally dump the full connection string in the extra object: {"extra":
>> {"uri": "blah"}}.
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to