Historically before `DbApiHook` was introduced some "DB-API 2" Hooks (I put
in quotes because not all properly implements this API),
implements this method in each hook separately, and some of them are not
implemented at all.

Everything changed since DbApiHook introduced and now 23 Community Hooks
based on it. DbApiHook implements it by interesting way.

    def get_uri(self) -> str:
        """
        Extract the URI from the connection.

        :return: the extracted uri.
        """
        conn = self.get_connection(getattr(self, self.conn_name_attr))
        conn.schema = self.__schema or conn.schema
        return conn.get_uri()


And it use by default in

    def get_sqlalchemy_engine(self, engine_kwargs=None):
        """
        Get an sqlalchemy_engine object.

        :param engine_kwargs: Kwargs used in
:func:`~sqlalchemy.create_engine`.
        :return: the created engine.
        """
        if engine_kwargs is None:
            engine_kwargs = {}
        return create_engine(self.get_uri(), **engine_kwargs)

IMO there are two bad assumptions:
1. "You could transform the Airflow Connection URI to the SQLAlchemy URI".
Many of the providers also have additional extra parameters which are not
supported by SQLalchemy.
In additional every changes in Airflow Connection URI format might breaks
result even if it worked before:
https://github.com/apache/airflow/issues/34456

2. "All DB implements SQLALchemy Dialect (
https://docs.sqlalchemy.org/en/20/dialects/) and we have packages for it"

I've split Hooks by different groups

Group 1 (7 Hooks): In the first glance works fine but better if someone
also could have a look

- SnowflakeHook from `snowflake` provider
- BigQueryHook from `google` provider
- SpannerHook from `google` provider
- RedshiftSQLHook from `amazon` provider
- OdbcHook from `odbc` provider
- DrillHook from `apache.drill` provider
- DruidDbApiHook from `apache.druid` provider


Group 2 (10 Hooks): Do not overwrite DbApiHook.get_uri implementation

- MySqlHook from `mysql` provider. In additional this hook provide support
for two different drivers:
  + mysqlclient: https://pypi.org/project/mysqlclient/
  + mysql-connector-python: https://pypi.org/project/mysql-connector-python/
- OracleHook from `oracle` provider
- JdbcHook from `jdbc` provider
- TrinoHook from `trino` provider
- HiveServer2Hook from `apache.hive` provider
- ImpalaHook from `apache.impala` provider
- DatabricksSqlHook from `databricks` provider
- ExasolHook from `exasol` provider
- PrestoHook from `presto` provider
- VerticaHook fromvertica` provider


Group 3 (3 Hooks): Uses Connection.get_uri for construct SQLalchemy URI

- PostgresHook from `postgres` provider
- MsSqlHook from `microsoft.mssql` provider
- SqliteHook from `sqlite` provider


Group 4 (2 Hooks): Implementation which would not work with
`get_sqlalchemy_engine` method

- PinotDbApiHook from `apache.pinot` provider, seems like it generate URI
for broker
- ElasticsearchSQLHook from `elasticsearch` provider, `elasticsearch` (
https://github.com/elastic/elasticsearch-py) do not provide
DB-API2/SQLAlchemy interface, `elasticsearch-dbapi` (
https://github.com/preset-io/elasticsearch-dbapi/) was removed in
apache-airflow-providers-elasticsearch>=5.0


Group 5 (1 Hook): Hooks from suspended providers

- QuboleCheckHook

I'm not sure what we can do in the current situation.
My proposal:
1. Add UserWarning in DbApiHook.get_uri and maybe in
DbApiHook.get_sqlalchemy_engine which inform that this methods might not
work correctly without re-implementation
2. Implements `get_uri` without Connection.get_uri for Group 3
3. Raise NotImplemented error in `get_sqlalchemy_engine` for Group 4


My hope for collective wisdom/mind and help in solving this puzzle by
proposing other ideas.

----
Best Wishes
*Andrey Anshin*

Reply via email to