mistercrunch commented on issue #30330:
URL: https://github.com/apache/superset/issues/30330#issuecomment-3131039849
Claude analysis below, we went deep on this one and learned a lot about the
internals of Superset chart/query lifecyle :)
---
## Analysis and Path Forward for Server-Side Histogram Implementation
I've been investigating what it would take to improve the histogram chart
to support server-side calculations and metric filters. Here's what I found:
### Current State
The histogram plugin currently uses a hybrid approach:
- Backend: Uses pandas post-processing via `histogramOperator`
- Frontend: ECharts for rendering
- **Limitation**: No aggregations means no `HAVING` clause support
### Multi-Phase Query Investigation
I explored how other Superset plugins handle multiple queries:
1. **BigNumber with Trendline**
(`/plugins/plugin-chart-big-number-total/src/BigNumberWithTrendline/`)
- Sends two queries in a single request
- Main query for current value + time series query for trend
- But crucially: queries are independent, results can't reference each
other
2. **Mixed Chart** (`/plugins/plugin-chart-mixed/`)
- Similar pattern: multiple independent queries in one request
- Each query has its own configuration
3. **Multi Chart** (`/plugins/legacy-plugin-chart-multiple-line-charts/`)
- Makes its own API calls from the frontend
- Not integrated with standard query flow
### The Challenge
For a proper server-side histogram, we need:
1. First query: `SELECT MIN(metric), MAX(metric) FROM table`
2. Use those results to calculate bin ranges
3. Second query: `SELECT SUM(CASE WHEN metric >= bin1 AND metric < bin2
THEN 1 END) as bin1_count, ...`
Current Superset architecture supports multiple queries but **only "all at
once"** - the result of the first query cannot be used to build the second
query.
### Solution: Async buildQuery Support
I've opened PR #34383 which provides the foundation for this. It enables:
- Async operations during the `buildQuery` phase
- Proper loading states during complex query building
- Ability to make API calls to determine query structure dynamically
With this PR, we could:
```javascript
async function buildQuery(formData) {
// 1. Fetch min/max values
const { min, max } = await fetchMinMax(formData);
// 2. Calculate bin ranges
const bins = calculateBins(min, max, formData.bins);
// 3. Build the histogram query with server-side binning
return buildQueryContext(formData, baseQueryObject => [{
...baseQueryObject,
metrics: bins.map(bin => ({
expressionType: 'SQL',
sqlExpression: `SUM(CASE WHEN ${column} >= ${bin.start} AND
${column} < ${bin.end} THEN 1 END)`,
label: bin.label
}))
}]);
}
This would enable:
- ✅ Metric filters (HAVING clause support)
- ✅ True server-side binning (better performance for large datasets)
- ✅ Dynamic bin calculation based on actual data range
- ✅ Support for all database engines (no special SQL requirements)
The async buildQuery support is backwards compatible and opens doors for
other complex visualizations that need preliminary data to construct their
queries.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]