codeant-ai-for-open-source[bot] commented on code in PR #37411:
URL: https://github.com/apache/superset/pull/37411#discussion_r2770719821


##########
superset/sql/execution/executor.py:
##########
@@ -684,6 +696,34 @@ def _check_disallowed_functions(self, script: SQLScript) 
-> set[str] | None:
 
         return found if found else None
 
+    def _check_disallowed_tables(self, script: SQLScript) -> set[str] | None:
+        """
+        Check for disallowed SQL tables/views.
+
+        :param script: Parsed SQL script
+        :returns: Set of disallowed tables found, or None if none found
+        """
+        disallowed_config = app.config.get("DISALLOWED_SQL_TABLES", {})
+        engine_name = self.database.db_engine_spec.engine
+
+        # Get disallowed tables for this engine
+        engine_disallowed = disallowed_config.get(engine_name, set())
+        if not engine_disallowed:
+            return None
+
+        # Use AST-based table detection
+        if script.check_tables_present(engine_disallowed):
+            # Return the specific tables that were found
+            found = set()
+            for statement in script.statements:
+                present = {table.table.lower() for table in statement.tables}
+                for table in engine_disallowed:
+                    if table.lower() in present:
+                        found.add(table)
+            return found or set(engine_disallowed)
+
+        return None

Review Comment:
   **Suggestion:** The `_check_disallowed_tables` helper calls 
`script.check_tables_present(engine_disallowed)` and then iterates over all 
statements and their tables again to compute the `found` set, causing two full 
AST traversals for every query; this is an unnecessary performance hit on a hot 
path and can be avoided by computing the matched disallowed tables in a single 
pass over `script.statements`. [performance]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Increased latency for SQL query execution (SQLExecutor.execute).
   - ⚠️ Slower SQL Lab queries and API query endpoints.
   ```
   </details>
   
   ```suggestion
           # Use AST-based table detection in a single pass
           found: set[str] = set()
           for statement in script.statements:
               present = {table.table.lower() for table in statement.tables}
               for table in engine_disallowed:
                   if table.lower() in present:
                       found.add(table)
   
           return found or None
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Ensure a DISALLOWED_SQL_TABLES blocklist is configured (added in 
superset/config.py).
   This enables the new table-checking code path. (Config location: 
superset/config.py; PR
   described adding this config.)
   
   2. Submit a SQL query via SQL Lab or any code path that calls 
SQLExecutor.execute.
   Execution path: superset/sql/execution/executor.py -> execute() calls
   self._prepare_sql(...), then calls self._check_security(transformed_script) 
(see
   _prepare_sql return around lines 431-444 and _check_security invocation 
added in the PR at
   superset/sql/execution/executor.py: line 466).
   
   3. Inside _check_security (superset/sql/execution/executor.py), the code 
calls
   _check_disallowed_tables(script) at line 466 which enters the function 
defined at
   superset/sql/execution/executor.py: lines 699-725. The function first calls
   script.check_tables_present(engine_disallowed) (located in this function at 
line ~715),
   which traverses the SQL AST to detect tables, and then, on success, 
immediately loops
   again over statement.tables to build the found set (the loop beginning at 
line ~718). This
   is the second traversal of statement.tables for the same statements.
   
   4. Observe the performance overhead by profiling or logging around these 
lines: add
   temporary timing/log statements before/after the call at line ~715 and 
before/after the
   explicit loop at line ~718 in superset/sql/execution/executor.py to see 
duplicate work on
   complex multi-statement queries. The duplicate traversal will be visible as 
two separate
   AST/table-inspection costs for the same query.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/sql/execution/executor.py
   **Line:** 714:725
   **Comment:**
        *Performance: The `_check_disallowed_tables` helper calls 
`script.check_tables_present(engine_disallowed)` and then iterates over all 
statements and their tables again to compute the `found` set, causing two full 
AST traversals for every query; this is an unnecessary performance hit on a hot 
path and can be avoided by computing the matched disallowed tables in a single 
pass over `script.statements`.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to