mistercrunch opened a new issue, #33870:
URL: https://github.com/apache/superset/issues/33870
## Motivation
MCP is an open JSON-RPC 2.0 spec that gives any LLM a **universal “USB-C
port”** for tools, data, and actions — one schema, no custom glue. Anthropic
open-sourced it in Nov 2024, and the rest of the stack stampeded in: OpenAI
baked MCP into ChatGPT & the Agents SDK; Google DeepMind is wiring it into
Gemini; Microsoft highlighted it during Build; dev–tool players like Replit,
Zed, Sourcegraph, and Block already ship MCP
servers.:contentReference[oaicite:0]{index=0}
### Why REST doesn’t cut it for agents
- **Auth done wrong** – Users don’t want and shouldn’t share their API key
with an LLM
- **Too many calls** – REST is designed for apps, so it’s extremely verbose
and “atomic”
- **IDs vs. context** – agents don’t want `owner_id=42`, they want `{id:42,
username:"jdoe", email:"…"}` right away.
### Superset + MCP → headless-BI 2.0
Superset is already headless; MCP lets any agent pick up the steering wheel.
| Agent ask | Superset does |
| --- | --- |
| “**Search** all dashboards on churn.” | `list_dashboards query="churn"` |
| “**Create** a line chart of MRR vs time.” | `generate_chart metric="MRR"
dim="date"` |
| “**Open** an Explore on top 50 customers.” | `generate_explore_link
dataset="customers" limit=50` |
| “Why did revenue dip? **Drill** into root cause.” |
`run_root_cause_analysis dashboard_id=123` |
End-to-end, the LLM can pry data, build viz, and kick the user straight into
SQL Lab or Explore, then hand control back. That’s AI-augmented, headless BI.
###
### How MCP Relates to the RAG-Focused SIPs
**Different directions, different problems — both useful.**
| Dimension | RAG-centric SIPs | MCP SIP (this doc) |
| --- | --- | --- |
| **Call direction** | **Superset ➜ LLM**<br>Superset queries an external
model for extra context or explanations. | **LLM ➜ Superset**<br>External
agents call Superset to fetch assets or trigger actions. |
| **Primary benefit** | Enriches the *user’s* experience *inside* Superset
(semantic search, chart “explainers,” etc.). | Lets agents *outside* Superset
automate everything users can do through the UI. |
| **Auth model** | Superset authenticates **to** the model. | Agent
authenticates **to** Superset, fully RBAC-aware. |
| **Granularity** | Model returns unstructured answers. | Superset returns
deterministic, typed objects and links. |
| **Dependency** | Needs vector stores / LangChain wrappers inside Superset.
| Needs an MCP blueprint exposed by Superset. |
These tracks don’t depend on each other, and neither blocks the other:
- **Ship RAG features** for smarter querying and insight *within* Superset.
- **Ship MCP** so any LLM agent can treat Superset as just another tool in a
multi-app workflow.
Separate SIPs, separate code paths, complementary value. Feel free to pursue
and ship either (or both) in any order.```
---
## Proposed Change
| Aspect | Detail |
| --- | --- |
| **Blueprint** | Opt-in Flask blueprint surfaced by helper pkg
**`fastmcp`** (MIT, ≤300 LOC). |
| **Toggle** | `ENABLE_MCP_SERVICE = False` (default). |
| **CLI** | `superset mcp run --port 5008`. |
| **Namespace** | `/api/mcp/v1/*` (tag kept but less rigid than REST; see
*Versioning*). |
| **Runtime** | WSGI-Flask by default; ASGI wrapping possible via
`asgiref.wsgi.WsgiToAsgi`. |
| **Hooks** | `auth_hook`, `impersonate`, `audit_log`, `rate_limit` — no-ops
in OSS, pluggable in Preset & enterprise. |
### Code Reuse / DRY Strategy
- **Single source of truth**: Commands + DAOs encapsulate business rules.
Similar REST / MCP endpoints compose the same set of commands + DAOs
- MCP and REST **compose** those objects; no logic duplication.
- Shared Marshmallow schemas reused directly or shallow-wrapped to add
denormalised fields.
### High-Level vs. Atomic
MCP tools are **chunkier**: one call, one meaningful action, denormalised
payloads (e.g. `owners:[{id, username, email}]` as opposed to
`owner_ids=[1,2,3]`) to spare agents extra look-ups.
As a general rule of thumb, we'll try design tools while aligning with
"agent stories", the agent counterpart of "user stories". CRUD interface will
be simplified with simpler, intuitive schemas, following some of the principles
highlighted in https://www.jlowin.dev/blog/as-an-agent-the-new-user-story
### Versioning Philosophy
LLMs parse tool schemas *in-session*. Non-destructive breaking tweaks
(rename `owner_ids`→`owners`) don’t require heavy semver ceremony. We bump
`/v{n}` only for removals or semantic flips.
### Initial Action Set (Phase 1)
*Discovery* → `list_*` • *Navigation* → `generate_explore_link`,
`open_sql_lab_with_context` • *Mutations* → `generate_chart`,
`generate_dashboard`, `add_chart_to_existing_dashboard`
### Deliverables
- Blueprint + flag + CLI
- 3-5 tools with unit, integration, and perf smoke tests
- Minimal OpenAPI spec + auto-generated TS/Python client
- Error envelope `{ "error": { "code": "...", "message": "..." } }`
- Demo notebook/script
## New / Changed Public Interfaces
| Interface | Addition |
| --- | --- |
| REST | `/api/mcp/v1/*` + `/api/simple/v1/*` |
| Config | `ENABLE_MCP_SERVICE` |
| CLI | `superset mcp run` |
| Python | Optional `import fastmcp` |
## Phasing / Roll-out Plan
| Phase | Goal | Outcome |
| --- | --- | --- |
| **1 – Proof of Concept** | Skeleton + 3-5 tools | Live agent demo: list →
chart → SQL Lab |
| **2 – Coverage Expansion** | Broader tool library | > 80 % of daily
actions scriptable |
| **3 – Production Hardening** | Extract **`superset-core`**; add robust
auth/impersonation/logging | GA under OIDC / Okta / Preset Cloud |
## Longer-Term Package Topology
```mermaid
flowchart LR
core[superset-core]
superset-app --> core
superset-rest --> core
superset-ext --> core
superset-mcp --> core
```
## Industry Context: Auth & Impersonation
Token exchange vs. signed-JWT vs. OAuth device-flow is still shaking out.
Phase 1 ships **hooks + tests**; adapters drop when a clear winner emerges.
---
## New Dependencies
- **fastmcp** – internal helper, MIT, no external deps
- **asgiref** – optional (Apache-2) for ASGI wrapping
## Migration Plan & Compatibility
- Disabled by default → zero impact
- No DB migrations
- Future breaking changes gated behind `/v{n}` and announced on `dev@`
---
## Rejected Alternatives
| Alternative | Why Not |
| --- | --- |
| **External REST bridge** (`superset-mcp` PoC) | Extra hop, latency,
duplicated RBAC/validation, schema drift |
| **Immediate full `superset-core` extraction** | Multi-month refactor;
slows PoC. Scheduled for Phase 3 |
Embedded MCP provides **speed now** and **maintainability later**,
complementing RAG efforts and keeping Superset at the center of AI-driven
analytics.
## Why Model Context Protocol (MCP) and Why Now?
The **Model Context Protocol (MCP)** is an open standard that lets
large-language-model agents call *tools*—high-level, domain-specific
actions—over a simple, schema-declared interface. Think of it as **USB-C for
AI**: one plug that works across copilots (Claude, GitHub Copilot, Cursor,
etc.) and related services (Postgres, GitHub, Slack, MotherDuck, Superset). The
spec was open-sourced by Anthropic in late 2024 and has since been adopted or
trialed by Microsoft, Hex, MotherDuck, Zed, Replit, Sourcegraph, Block, and
others.:contentReference[oaicite:0]{index=0}
### Why REST Isn’t Enough
REST was built for machine-to-machine plumbing, not for autonomous agents:
- **API-keys & secrets** Models shouldn’t see them. MCP sessions carry
*scoped* credentials or use local sockets—no key leakage.
- **Over-atomic verbs** Listing a dashboard, grabbing its metadata, then
building a chart is 3-5 calls in REST but **one tool call** in MCP.
- **Hyper-verbose schemas** REST spreads context across many endpoints; LLMs
lose the thread. MCP bundles denormalised payloads that match the agent’s
mental model.
### Superset + MCP ⇒ Headless, AI-Ready BI
Superset already exposes a rich REST API, but agents must screenscrape or
choreograph dozens of endpoints. Baking MCP into Superset means any copilot can
treat Superset as a *first-class tool* in AI-driven workflows—perfect fit for
our **headless BI** push.
What an LLM can do once MCP is live:
| Action | Example prompt the agent can satisfy |
| --- | --- |
| **Search assets** | “List dashboards tagged *revenue* created in the last
30 days.” |
| **Spin up viz** | “Create a bar chart showing ARR by region and add it to
‘Q3 Exec Dashboard’.” |
| **Jump to context** | “Open SQL Lab on the `raw_events` dataset with a
sample query.” |
| **Chat-in-context** | “Why did MRR drop in EMEA last month? Show a quick
breakdown.” |
| **Multi-app chains** | Combine Superset → dbt → MotherDuck in one agent
plan for root-cause analysis. |
In short, MCP lets **any LLM do (almost) everything a human can in the UI,
then hand the wheel back**—unlocking AI-augmented analytics inside and around
Superset without brittle glue code.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]