codeant-ai-for-open-source[bot] commented on code in PR #37183:
URL: https://github.com/apache/superset/pull/37183#discussion_r2770535514
##########
superset/mcp_service/chart/tool/get_chart_data.py:
##########
@@ -42,6 +45,21 @@
logger = logging.getLogger(__name__)
+def _get_cached_form_data(form_data_key: str) -> str | None:
+ """Retrieve form_data from cache using form_data_key.
+
+ Returns the JSON string of form_data if found, None otherwise.
+ """
+ from superset.commands.explore.form_data.get import GetFormDataCommand
+
+ try:
+ cmd_params = CommandParameters(key=form_data_key)
+ return GetFormDataCommand(cmd_params).run()
+ except (KeyError, ValueError, CommandException) as e:
+ logger.warning("Failed to retrieve form_data from cache: %s", e)
Review Comment:
**Suggestion:** `_get_cached_form_data` claims to return a string or None
but doesn't catch `TemporaryCacheGetFailedError` raised by
`GetFormDataCommand.run`, so a transient cache/DB error will bubble up and fail
the entire chart data request instead of gracefully falling back to the saved
configuration. [logic error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ get_chart_data requests fail on transient cache DB errors.
- ⚠️ Unsaved-state retrieval won't gracefully fallback.
- ⚠️ Degrades MCP reliability under cache failures.
```
</details>
```suggestion
from superset.commands.explore.form_data.exceptions import (
TemporaryCacheGetFailedError,
)
try:
cmd_params = CommandParameters(key=form_data_key)
return GetFormDataCommand(cmd_params).run()
except (KeyError, ValueError, CommandException,
TemporaryCacheGetFailedError) as e:
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Store a form_data key in explore_form_data_cache (normal usage) or
simulate cache
failure. The command used is GetFormDataCommand defined in
superset/commands/explore/form_data/get.py (see file lines 35-63 in
Additional Downstream
Code Context).
2. Trigger the MCP get_chart_data tool with a request that includes
form_data_key
(get_chart_data entry at superset/mcp_service/chart/tool/get_chart_data.py;
function
begins at line ~65 in the PR diff).
3. get_chart_data calls _get_cached_form_data(form_data_key) (helper defined
at lines
~48-61). GetFormDataCommand.run() may raise TemporaryCacheGetFailedError
when cache/DB has
a transient error (see get.py: run() raising TemporaryCacheGetFailedError on
SQLAlchemyError at lines 35-63).
4. Because _get_cached_form_data currently does not catch
TemporaryCacheGetFailedError,
that exception will propagate, causing the chart data request to fail
instead of logging a
warning and falling back to saved chart configuration.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/chart/tool/get_chart_data.py
**Line:** 54:59
**Comment:**
*Logic Error: `_get_cached_form_data` claims to return a string or None
but doesn't catch `TemporaryCacheGetFailedError` raised by
`GetFormDataCommand.run`, so a transient cache/DB error will bubble up and fail
the entire chart data request instead of gracefully falling back to the saved
configuration.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/mcp_service/dashboard/tool/get_dashboard_info.py:
##########
@@ -71,14 +110,45 @@ async def get_dashboard_info(
result = tool.run_tool(request.identifier)
if isinstance(result, DashboardInfo):
+ # If permalink_key is provided, retrieve filter state
+ if request.permalink_key:
+ await ctx.info(
+ "Retrieving filter state from permalink: permalink_key=%s"
+ % (request.permalink_key,)
+ )
+ permalink_value = _get_permalink_state(request.permalink_key)
Review Comment:
**Suggestion:** The synchronous permalink lookup `_get_permalink_state` is
called directly inside an async tool, which can block the event loop during
database and key-value store access; wrapping it in `asyncio.to_thread` avoids
blocking other async tasks. [possible bug]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ MCP tool latency increases under concurrent requests.
- ⚠️ Event loop stalls during DB/key-value access.
- ⚠️ Affects all callers of get_dashboard_info with permalink_key.
```
</details>
```suggestion
import asyncio
permalink_value = await asyncio.to_thread(
_get_permalink_state, request.permalink_key
)
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Start the MCP service (superset/cli/mcp.py:32-44) so FastMCP async
handlers run.
2. Send concurrent FastMCP requests to get_dashboard_info
(superset/mcp_service/dashboard/tool/get_dashboard_info.py:62), each with a
permalink_key
so the code path at lines 114-119 is executed.
3. Execution hits the synchronous helper _get_permalink_state defined at
superset/mcp_service/dashboard/tool/get_dashboard_info.py:45-57. That
function calls
GetDashboardPermalinkCommand.run
(superset/commands/dashboard/permalink/get.py:38-63),
which performs synchronous DB and key-value DAO calls (KeyValueDAO.get_value
and
DashboardDAO.get_by_id_or_slug).
4. Because get_dashboard_info is async but calls the blocking
_get_permalink_state
directly at line 119, the event loop running FastMCP's async handlers can be
blocked for
the duration of DB / key-value I/O. Under concurrent requests, this causes
increased
latency and stalls other async tasks.
5. Reproduce deterministically by issuing multiple concurrent
get_dashboard_info requests
(each with permalink_key) and observing increased response latency or
serialized handling
compared to wrapping _get_permalink_state in asyncio.to_thread.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/dashboard/tool/get_dashboard_info.py
**Line:** 119:119
**Comment:**
*Possible Bug: The synchronous permalink lookup `_get_permalink_state`
is called directly inside an async tool, which can block the event loop during
database and key-value store access; wrapping it in `asyncio.to_thread` avoids
blocking other async tasks.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/mcp_service/chart/tool/get_chart_data.py:
##########
@@ -29,6 +30,8 @@
if TYPE_CHECKING:
from superset.models.slice import Slice
+from superset.commands.exceptions import CommandException
+from superset.commands.explore.form_data.parameters import CommandParameters
Review Comment:
**Suggestion:** The import for `CommandParameters` points to
`superset.commands.explore.form_data.parameters`, but the class is defined in
`superset.commands.explore.parameters`, so trying to use
`_get_cached_form_data` will raise an ImportError at runtime when that line is
executed. [possible bug]
<details>
<summary><b>Severity Level:</b> Critical 🚨</summary>
```mdx
- ❌ MCP service may fail to start due to ImportError.
- ❌ get_chart_data tool unavailable to callers.
- ⚠️ Any dashboards relying on MCP tools affected.
```
</details>
```suggestion
from superset.commands.explore.parameters import CommandParameters
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Start the MCP service which imports tools (module import occurs). See
file:
superset/mcp_service/chart/tool/get_chart_data.py (module top-level imports
around lines
33-38). Python imports the module and executes top-level imports.
2. During import, Python evaluates "from
superset.commands.explore.form_data.parameters
import CommandParameters" (line shown above). That module path does not
exist.
3. The real class is defined in superset/commands/explore/parameters.py (see
Additional
Downstream Code Context: file superset/commands/explore/parameters.py lines
22-27 where
CommandParameters is declared).
4. Result: ImportError is raised during module import, preventing the MCP
tool module from
loading and making get_chart_data unusable. This is deterministic and
reproducible on
service start.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/chart/tool/get_chart_data.py
**Line:** 34:34
**Comment:**
*Possible Bug: The import for `CommandParameters` points to
`superset.commands.explore.form_data.parameters`, but the class is defined in
`superset.commands.explore.parameters`, so trying to use
`_get_cached_form_data` will raise an ImportError at runtime when that line is
executed.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/mcp_service/chart/tool/get_chart_data.py:
##########
@@ -119,20 +144,102 @@ async def get_chart_data( # noqa: C901
)
logger.info("Getting data for chart %s: %s", chart.id,
chart.slice_name)
- import time
-
start_time = time.time()
+ # Track whether we're using unsaved state
+ using_unsaved_state = False
+ cached_form_data_dict = None
+
try:
await ctx.report_progress(2, 4, "Preparing data query")
from superset.charts.schemas import ChartDataQueryContextSchema
from superset.commands.chart.data.get_data_command import
ChartDataCommand
+ # Check if form_data_key is provided - use cached form_data instead
+ if request.form_data_key:
+ await ctx.info(
+ "Retrieving unsaved chart state from cache:
form_data_key=%s"
+ % (request.form_data_key,)
+ )
+ if cached_form_data :=
_get_cached_form_data(request.form_data_key):
+ try:
+ cached_form_data_dict =
utils_json.loads(cached_form_data)
+ using_unsaved_state = True
+ await ctx.info(
+ "Using cached form_data from form_data_key for
data query"
+ )
+ except (TypeError, ValueError) as e:
+ await ctx.warning(
+ "Failed to parse cached form_data: %s. "
+ "Falling back to saved chart configuration." %
str(e)
+ )
+ else:
+ await ctx.warning(
+ "form_data_key provided but no cached data found. "
+ "The cache may have expired. Using saved chart
configuration."
+ )
+
# Use the chart's saved query_context - this is the key!
# The query_context contains all the information needed to
reproduce
# the chart's data exactly as shown in the visualization
query_context_json = None
- if chart.query_context:
+
+ # If using cached form_data, we need to build query_context from it
+ if using_unsaved_state and cached_form_data_dict:
+ # Build query context from cached form_data (unsaved state)
+ from superset.common.query_context_factory import
QueryContextFactory
+
Review Comment:
**Suggestion:** The condition `if using_unsaved_state and
cached_form_data_dict:` relies on the truthiness of the decoded form-data dict,
so an empty but valid JSON object (`{}`) sets `using_unsaved_state` to True but
skips building a query context and also disables the fallback path, leading to
`query_context` never being assigned and causing an UnboundLocalError when
executing the query. [logic error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ get_chart_data crashes on empty cached form_data.
- ⚠️ Explaining edited-but-unsaved charts fails.
- ⚠️ Breaks MCP tool reliability for unsaved states.
```
</details>
```suggestion
if using_unsaved_state and cached_form_data_dict is not None:
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Save an "unsaved" Explore state into the cache where the stored form_data
JSON decodes
to an empty object: '{}' (this is realistic when users open Explore but
don't populate
fields). The cache get path is handled by GetFormDataCommand.run (see
superset/commands/explore/form_data/get.py lines 35-63).
2. Call the MCP get_chart_data tool with that form_data_key (tool entry at
superset/mcp_service/chart/tool/get_chart_data.py, function defined at line
~65).
3. _get_cached_form_data returns the JSON string '{}' and
utils_json.loads(cached_form_data) yields {}. The code sets
using_unsaved_state = True
(when parsing succeeded) but cached_form_data_dict is {} (an empty dict).
4. Later the code checks "if using_unsaved_state and cached_form_data_dict:"
(lines
~185-191). Because {} is falsy, the branch that builds query_context is
skipped; because
using_unsaved_state is True, the fallback that constructs a query_context
from saved
chart.params (the "if query_context_json is None and not
using_unsaved_state" block) is
also skipped. As a result query_context is never assigned and later usage
(ChartDataCommand(query_context) execution) raises an UnboundLocalError /
NameError,
causing the request to fail.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/chart/tool/get_chart_data.py
**Line:** 191:191
**Comment:**
*Logic Error: The condition `if using_unsaved_state and
cached_form_data_dict:` relies on the truthiness of the decoded form-data dict,
so an empty but valid JSON object (`{}`) sets `using_unsaved_state` to True but
skips building a query context and also disables the fallback path, leading to
`query_context` never being assigned and causing an UnboundLocalError when
executing the query.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/mcp_service/dashboard/tool/get_dashboard_info.py:
##########
@@ -71,14 +110,45 @@ async def get_dashboard_info(
result = tool.run_tool(request.identifier)
if isinstance(result, DashboardInfo):
+ # If permalink_key is provided, retrieve filter state
+ if request.permalink_key:
+ await ctx.info(
+ "Retrieving filter state from permalink: permalink_key=%s"
+ % (request.permalink_key,)
+ )
+ permalink_value = _get_permalink_state(request.permalink_key)
+
+ if permalink_value:
+ # Extract the state from permalink value
Review Comment:
**Suggestion:** The filter state retrieved from a permalink is applied to
whatever dashboard was requested, without verifying that the permalink actually
belongs to that dashboard, which can result in returning a dashboard's metadata
combined with another dashboard's filter state. [logic error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ MCP get_dashboard_info returns mismatched filter_state.
- ⚠️ Chatbot explains wrong dashboard state to users.
- ⚠️ Downstream tools may apply incorrect filters.
```
</details>
```suggestion
# Ensure the permalink belongs to the same dashboard
before applying state
permalink_dashboard_id =
permalink_value.get("dashboardId")
if isinstance(permalink_dashboard_id, int) and
permalink_dashboard_id != result.id:
await ctx.warning(
"permalink_key dashboardId (%s) does not match
requested dashboard id (%s); "
"ignoring permalink filter state."
% (permalink_dashboard_id, result.id)
)
else:
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Start the MCP service (see superset/cli/mcp.py:32-44) so FastMCP tools
are available.
2. Call the FastMCP tool get_dashboard_info at
superset/mcp_service/dashboard/tool/get_dashboard_info.py:62 (function
get_dashboard_info), passing:
- identifier = <dashboard A id>
- permalink_key = <permalink created for dashboard B>
(FastMCP will route to get_dashboard_info because of the @tool decorator
and
parse_request.)
3. Inside get_dashboard_info
(superset/mcp_service/dashboard/tool/get_dashboard_info.py),
ModelGetInfoCore.run_tool is executed to load dashboard A (see run_tool call
at line
~110). The code then reaches the block at lines 114-121 where
request.permalink_key is
present and _get_permalink_state is called.
4. _get_permalink_state (defined at
superset/mcp_service/dashboard/tool/get_dashboard_info.py:45-57) calls
GetDashboardPermalinkCommand.run
(superset/commands/dashboard/permalink/get.py:38-63)
which returns a value containing "dashboardId" and "state" for dashboard B.
5. Because the existing code applies permalink_value.state to the result
without checking
permalink_value["dashboardId"], the returned DashboardInfo (result) for
dashboard A will
have filter_state populated with dashboard B's state (see lines 121-127),
producing a
mismatched dashboard metadata + filter_state combination.
6. Observe: the MCP response contains result.filter_state from the unrelated
permalink.
This reproduces the problem deterministically when a permalink_key
referencing a different
dashboard is used. The codebase does not currently validate dashboardId vs
result.id, so
the mismatch will occur whenever a caller supplies a permalink_key for a
different
dashboard.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/dashboard/tool/get_dashboard_info.py
**Line:** 122:122
**Comment:**
*Logic Error: The filter state retrieved from a permalink is applied to
whatever dashboard was requested, without verifying that the permalink actually
belongs to that dashboard, which can result in returning a dashboard's metadata
combined with another dashboard's filter state.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]