codeant-ai-for-open-source[bot] commented on code in PR #37411:
URL: https://github.com/apache/superset/pull/37411#discussion_r2770719821
##########
superset/sql/execution/executor.py:
##########
@@ -684,6 +696,34 @@ def _check_disallowed_functions(self, script: SQLScript)
-> set[str] | None:
return found if found else None
+ def _check_disallowed_tables(self, script: SQLScript) -> set[str] | None:
+ """
+ Check for disallowed SQL tables/views.
+
+ :param script: Parsed SQL script
+ :returns: Set of disallowed tables found, or None if none found
+ """
+ disallowed_config = app.config.get("DISALLOWED_SQL_TABLES", {})
+ engine_name = self.database.db_engine_spec.engine
+
+ # Get disallowed tables for this engine
+ engine_disallowed = disallowed_config.get(engine_name, set())
+ if not engine_disallowed:
+ return None
+
+ # Use AST-based table detection
+ if script.check_tables_present(engine_disallowed):
+ # Return the specific tables that were found
+ found = set()
+ for statement in script.statements:
+ present = {table.table.lower() for table in statement.tables}
+ for table in engine_disallowed:
+ if table.lower() in present:
+ found.add(table)
+ return found or set(engine_disallowed)
+
+ return None
Review Comment:
**Suggestion:** The `_check_disallowed_tables` helper calls
`script.check_tables_present(engine_disallowed)` and then iterates over all
statements and their tables again to compute the `found` set, causing two full
AST traversals for every query; this is an unnecessary performance hit on a hot
path and can be avoided by computing the matched disallowed tables in a single
pass over `script.statements`. [performance]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ Increased latency for SQL query execution (SQLExecutor.execute).
- ⚠️ Slower SQL Lab queries and API query endpoints.
```
</details>
```suggestion
# Use AST-based table detection in a single pass
found: set[str] = set()
for statement in script.statements:
present = {table.table.lower() for table in statement.tables}
for table in engine_disallowed:
if table.lower() in present:
found.add(table)
return found or None
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Ensure a DISALLOWED_SQL_TABLES blocklist is configured (added in
superset/config.py).
This enables the new table-checking code path. (Config location:
superset/config.py; PR
described adding this config.)
2. Submit a SQL query via SQL Lab or any code path that calls
SQLExecutor.execute.
Execution path: superset/sql/execution/executor.py -> execute() calls
self._prepare_sql(...), then calls self._check_security(transformed_script)
(see
_prepare_sql return around lines 431-444 and _check_security invocation
added in the PR at
superset/sql/execution/executor.py: line 466).
3. Inside _check_security (superset/sql/execution/executor.py), the code
calls
_check_disallowed_tables(script) at line 466 which enters the function
defined at
superset/sql/execution/executor.py: lines 699-725. The function first calls
script.check_tables_present(engine_disallowed) (located in this function at
line ~715),
which traverses the SQL AST to detect tables, and then, on success,
immediately loops
again over statement.tables to build the found set (the loop beginning at
line ~718). This
is the second traversal of statement.tables for the same statements.
4. Observe the performance overhead by profiling or logging around these
lines: add
temporary timing/log statements before/after the call at line ~715 and
before/after the
explicit loop at line ~718 in superset/sql/execution/executor.py to see
duplicate work on
complex multi-statement queries. The duplicate traversal will be visible as
two separate
AST/table-inspection costs for the same query.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/sql/execution/executor.py
**Line:** 714:725
**Comment:**
*Performance: The `_check_disallowed_tables` helper calls
`script.check_tables_present(engine_disallowed)` and then iterates over all
statements and their tables again to compute the `found` set, causing two full
AST traversals for every query; this is an unnecessary performance hit on a hot
path and can be avoided by computing the matched disallowed tables in a single
pass over `script.statements`.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]