GitHub user alonahmias edited a discussion: Add X-Trino-Client-Info Header to 
Trino Requests with Dashboard Metadata

## Feature Request: Add `X-Trino-Client-Info` Header to Trino Requests with 
Dashboard Metadata (Configurable)

**Title:** Enhance Trino Integration: Configurably Include Dashboard Metadata 
in `X-Trino-Client-Info` Header for Auditing and Evaluation

**Problem Statement:**

Currently, when Superset queries Trino, there is no direct mechanism to pass 
relevant Superset-specific context, such as the dashboard ID or name, directly 
to Trino in a standardized and easily consumable way. This lack of context 
severely limits the ability of Trino administrators and data governance tools 
to:

  * **Comprehensive Auditing:** Accurately track which Superset dashboards are 
generating specific queries, making it difficult to understand the **full 
context and origin of data access and transformation events**. This hinders 
compliance, security monitoring, and post-incident analysis.
  * **Performance and Usage Evaluation:** Without knowing which dashboard 
initiated a query, it's challenging to **evaluate the performance impact or 
resource consumption of specific dashboards**. This makes it difficult to 
identify underperforming dashboards, optimize resource allocation, or 
understand overall dashboard usage patterns over time.
  * **Troubleshooting and Impact Analysis:** When issues arise in Trino (e.g., 
slow queries, errors), it's cumbersome to **trace them back to their 
originating Superset dashboard**. This prolongs troubleshooting and makes it 
harder to assess the impact of changes or issues on specific analytical views.

**Proposed Solution:**

We propose adding a feature to Superset's Trino connector that automatically 
includes a custom HTTP header, `X-Trino-Client-Info`, in all Trino requests. 
This header would contain JSON-formatted metadata about the Superset dashboard 
and, optionally, the user from which the query originates. **Crucially, this 
feature should be configurable, allowing administrators to enable/disable it 
and select which metadata fields are included.** This will provide the 
necessary context for robust auditing and detailed evaluations while offering 
flexibility to different environments.

**Detailed Proposal:**

1.  **Header Name:** The header should be `X-Trino-Client-Info`. This header is 
recognized by Trino as a standard way to pass client-side information.
2.  **Header Content:** The value of the `X-Trino-Client-Info` header should be 
a JSON string containing selected dashboard and user metadata.
3.  **Configurability (Key Aspect):**
      * **Enable/Disable Toggle:** A clear option in the Trino database 
connection settings (e.g., within the "Advanced" tab or a dedicated "Trino 
Client Info" section) to enable or disable the inclusion of this header.
      * **Selectable Fields:** A set of checkboxes or a multi-select dropdown 
allowing administrators to choose which specific metadata fields to include in 
the JSON payload. Potential fields include:
          * `dashboard_id`: The unique identifier of the Superset dashboard.
          * `dashboard_name`: The human-readable name of the Superset dashboard.
          * `slice_id`: The unique identifier of the chart/slice within a 
dashboard (if applicable).
          * `slice_name`: The human-readable name of the chart/slice (if 
applicable).
          * `user_id`: The ID of the Superset user who initiated the query.
          * `username`: The username of the Superset user who initiated the 
query.
      * **Default Behavior:** The feature could be disabled by default, or 
enabled with a basic set of fields (e.g., `dashboard_id`, `dashboard_name`), 
allowing administrators to opt-in or customize further.
4.  **Implementation Considerations:**
      * The metadata should be dynamically retrieved at the time of query 
execution.
      * Error handling should be robust; if selected metadata cannot be 
retrieved, the corresponding field should either be omitted from the JSON or 
contain a clear indication of the missing information (e.g., `null` values), 
depending on the chosen design.
      * Consider the impact on performance, though the added overhead of a 
small, configurable JSON string is expected to be negligible.

**Benefits:**

  * **Richer Audit Trails:** Provides comprehensive, **configurable** context 
in Trino logs, allowing for detailed auditing of data access originating from 
specific Superset dashboards and users. This is crucial for compliance, 
security, and accountability.
  * **In-depth Usage Analytics:** Enables the evaluation of dashboard 
popularity, query patterns, and resource consumption, leading to better 
insights into how dashboards are being used and how to optimize them.
  * **Streamlined Performance Evaluation:** Directly link Trino query 
performance metrics back to specific Superset dashboards, simplifying the 
process of identifying and resolving performance bottlenecks.
  * **Improved Troubleshooting:** Accelerates the diagnosis of issues by 
providing immediate visibility into the Superset dashboard responsible for a 
given Trino query.
  * **Enhanced Data Governance:** Supports more robust data governance 
initiatives by providing a clear lineage of data access requests through 
Superset.
  * **Flexibility for Administrators:** Allows administrators to tailor the 
amount of information sent to Trino based on their specific auditing, logging, 
and security requirements, without forcing unnecessary data transmission.

**Example `X-Trino-Client-Info` Header Value (based on configuration):**

If only `dashboard_id` and `dashboard_name` are selected:

```
X-Trino-Client-Info: {"dashboard_id": 123, "dashboard_name": "Sales Overview"}
```

If all fields are selected:

```
X-Trino-Client-Info: {"dashboard_id": 123, "dashboard_name": "Sales Overview", 
"slice_id": 456, "slice_name": "Revenue by Region Chart", "user_id": 789, 
"username": "analyst_user"}
```

**Affected Components:**

  * Superset Trino DB EngineSpec
  * Superset Database Connection UI/Configuration (specifically for Trino 
connections)
  * Superset Query Generation Logic

This feature would significantly enhance the integration between Superset and 
Trino, providing critical and **configurable** visibility for auditing, 
performance evaluation, and overall data platform management.

I'm willing to contribute this feature, if it pleasing in your eyes, but i 
would need some help for it :smile: 


GitHub link: https://github.com/apache/superset/discussions/34236

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to