withnale commented on issue #32789: URL: https://github.com/apache/superset/issues/32789#issuecomment-2742999053
The native code in superset hasn't made provision for separation of project_id usage and primarily used the standard `create_engine` SQLA calls. The `DB_CONNECTION_MUTATOR` is available however which seems to run whenever a DB connection is created. Rather than just passing in parameters, it seemed to be possible to use the [supplying-your-own-bigquery-client](https://github.com/googleapis/python-bigquery-sqlalchemy?tab=readme-ov-file#supplying-your-own-bigquery-client) logic present in python-bigquery-sqlalchemy and make a default project decision based on context... ```python # shortened def DB_CONNECTION_MUTATOR(uri, params, _username, _security_manager, source): credentials_info = params.get('credentials_info', None) credentials = service_account.Credentials.from_service_account_info(credentials_info) project = 'some magic occurs here' client = bigquery.Client(credentials=credentials, project=project) params['connect_args'] = {'client': client} return uri.update_query_dict({"user_supplied_client":"True"}), params ``` Obviously, the key part here is the 'magic' since you need to be able to make a decision about the correct project based on any context that the DB_CONNECTION_MUTATOR has available to it. The only real context is the `source` field, and I've done some experimenting in setting the correct project based on that... ```python if source is None: project = DATASET_PROJECT elif source.name in ['CHART', 'SQL_LAB']: project = JOB_PROJECT else: logger.error(f"DB_CONNECTION_MUTATOR: Unknown source: {source}") ``` This seems too brittle and for many use cases `source=None` when a decision needs to be made. Also, fundamentally I don't think this approach is robust regarding connection reuse and pooling. I think it's probably better to try to fix the "destination decision making" part of this issue upstream in the python-sqlalchemy-bigquery repository, since at present the logic there doesn't specify a project explicitly on their bigquery calls (such as below). Modifying the various bigquery client calls to be explicit seems by far the cleanest solution making the logic available to any product built on sqlalchemy not just superset. ```diff # python-sqlalchemy-bigquery/sqlalchemy_bigquery/core.py:1335 - datasets = connection.connection._client.list_datasets() + datasets = connection.connection._client.list_datasets(self.project_id) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
