GitHub user alonahmias edited a discussion: Add X-Trino-Client-Info Header to
Trino Requests with Dashboard Metadata
## Feature Request: Add `X-Trino-Client-Info` Header to Trino Requests with
Dashboard Metadata (Configurable)
**Title:** Enhance Trino Integration: Configurably Include Dashboard Metadata
in `X-Trino-Client-Info` Header for Auditing and Evaluation
**Problem Statement:**
Currently, when Superset queries Trino, there is no direct mechanism to pass
relevant Superset-specific context, such as the dashboard ID or name, directly
to Trino in a standardized and easily consumable way. This lack of context
severely limits the ability of Trino administrators and data governance tools
to:
* **Comprehensive Auditing:** Accurately track which Superset dashboards are
generating specific queries, making it difficult to understand the **full
context and origin of data access and transformation events**. This hinders
compliance, security monitoring, and post-incident analysis.
* **Performance and Usage Evaluation:** Without knowing which dashboard
initiated a query, it's challenging to **evaluate the performance impact or
resource consumption of specific dashboards**. This makes it difficult to
identify underperforming dashboards, optimize resource allocation, or
understand overall dashboard usage patterns over time.
* **Troubleshooting and Impact Analysis:** When issues arise in Trino (e.g.,
slow queries, errors), it's cumbersome to **trace them back to their
originating Superset dashboard**. This prolongs troubleshooting and makes it
harder to assess the impact of changes or issues on specific analytical views.
**Proposed Solution:**
We propose adding a feature to Superset's Trino connector that automatically
includes a custom HTTP header, `X-Trino-Client-Info`, in all Trino requests.
This header would contain JSON-formatted metadata about the Superset dashboard
and, optionally, the user from which the query originates. **Crucially, this
feature should be configurable, allowing administrators to enable/disable it
and select which metadata fields are included.** This will provide the
necessary context for robust auditing and detailed evaluations while offering
flexibility to different environments.
**Detailed Proposal:**
1. **Header Name:** The header should be `X-Trino-Client-Info`. This header is
recognized by Trino as a standard way to pass client-side information.
2. **Header Content:** The value of the `X-Trino-Client-Info` header should be
a JSON string containing selected dashboard and user metadata.
3. **Configurability (Key Aspect):**
* **Enable/Disable Toggle:** A clear option in the Trino database
connection settings (e.g., within the "Advanced" tab or a dedicated "Trino
Client Info" section) to enable or disable the inclusion of this header.
* **Selectable Fields:** A set of checkboxes or a multi-select dropdown
allowing administrators to choose which specific metadata fields to include in
the JSON payload. Potential fields include:
* `dashboard_id`: The unique identifier of the Superset dashboard.
* `dashboard_name`: The human-readable name of the Superset dashboard.
* `slice_id`: The unique identifier of the chart/slice within a
dashboard (if applicable).
* `slice_name`: The human-readable name of the chart/slice (if
applicable).
* `user_id`: The ID of the Superset user who initiated the query.
* `username`: The username of the Superset user who initiated the
query.
* **Default Behavior:** The feature could be disabled by default, or
enabled with a basic set of fields (e.g., `dashboard_id`, `dashboard_name`),
allowing administrators to opt-in or customize further.
4. **Implementation Considerations:**
* The metadata should be dynamically retrieved at the time of query
execution.
* Error handling should be robust; if selected metadata cannot be
retrieved, the corresponding field should either be omitted from the JSON or
contain a clear indication of the missing information (e.g., `null` values),
depending on the chosen design.
* Consider the impact on performance, though the added overhead of a
small, configurable JSON string is expected to be negligible.
**Benefits:**
* **Richer Audit Trails:** Provides comprehensive, **configurable** context
in Trino logs, allowing for detailed auditing of data access originating from
specific Superset dashboards and users. This is crucial for compliance,
security, and accountability.
* **In-depth Usage Analytics:** Enables the evaluation of dashboard
popularity, query patterns, and resource consumption, leading to better
insights into how dashboards are being used and how to optimize them.
* **Streamlined Performance Evaluation:** Directly link Trino query
performance metrics back to specific Superset dashboards, simplifying the
process of identifying and resolving performance bottlenecks.
* **Improved Troubleshooting:** Accelerates the diagnosis of issues by
providing immediate visibility into the Superset dashboard responsible for a
given Trino query.
* **Enhanced Data Governance:** Supports more robust data governance
initiatives by providing a clear lineage of data access requests through
Superset.
* **Flexibility for Administrators:** Allows administrators to tailor the
amount of information sent to Trino based on their specific auditing, logging,
and security requirements, without forcing unnecessary data transmission.
**Example `X-Trino-Client-Info` Header Value (based on configuration):**
If only `dashboard_id` and `dashboard_name` are selected:
```
X-Trino-Client-Info: {"dashboard_id": 123, "dashboard_name": "Sales Overview"}
```
If all fields are selected:
```
X-Trino-Client-Info: {"dashboard_id": 123, "dashboard_name": "Sales Overview",
"slice_id": 456, "slice_name": "Revenue by Region Chart", "user_id": 789,
"username": "analyst_user"}
```
**Affected Components:**
* Superset Trino DB EngineSpec
* Superset Database Connection UI/Configuration (specifically for Trino
connections)
* Superset Query Generation Logic
This feature would significantly enhance the integration between Superset and
Trino, providing critical and **configurable** visibility for auditing,
performance evaluation, and overall data platform management.
I'm willing to contribute this feature, if it pleasing in your eyes, but i
would need some help for it :smile:
GitHub link: https://github.com/apache/superset/discussions/34236
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]