mistercrunch commented on issue #30330:
URL: https://github.com/apache/superset/issues/30330#issuecomment-3131039849

   Claude analysis below, we went deep on this one and learned a lot about the 
internals of Superset chart/query lifecyle :)
   
   ---
   
     ## Analysis and Path Forward for Server-Side Histogram Implementation
   
     I've been investigating what it would take to improve the histogram chart 
to support server-side calculations and metric filters. Here's what I found:
   
     ### Current State
     The histogram plugin currently uses a hybrid approach:
     - Backend: Uses pandas post-processing via `histogramOperator`
     - Frontend: ECharts for rendering
     - **Limitation**: No aggregations means no `HAVING` clause support
   
     ### Multi-Phase Query Investigation
   
     I explored how other Superset plugins handle multiple queries:
   
     1. **BigNumber with Trendline** 
(`/plugins/plugin-chart-big-number-total/src/BigNumberWithTrendline/`)
        - Sends two queries in a single request
        - Main query for current value + time series query for trend
        - But crucially: queries are independent, results can't reference each 
other
   
     2. **Mixed Chart** (`/plugins/plugin-chart-mixed/`)
        - Similar pattern: multiple independent queries in one request
        - Each query has its own configuration
   
     3. **Multi Chart** (`/plugins/legacy-plugin-chart-multiple-line-charts/`)
        - Makes its own API calls from the frontend
        - Not integrated with standard query flow
   
     ### The Challenge
   
     For a proper server-side histogram, we need:
     1. First query: `SELECT MIN(metric), MAX(metric) FROM table`
     2. Use those results to calculate bin ranges
     3. Second query: `SELECT SUM(CASE WHEN metric >= bin1 AND metric < bin2 
THEN 1 END) as bin1_count, ...`
   
     Current Superset architecture supports multiple queries but **only "all at 
once"** - the result of the first query cannot be used to build the second 
query.
   
     ### Solution: Async buildQuery Support
   
     I've opened PR #34383 which provides the foundation for this. It enables:
     - Async operations during the `buildQuery` phase
     - Proper loading states during complex query building
     - Ability to make API calls to determine query structure dynamically
   
     With this PR, we could:
     ```javascript
     async function buildQuery(formData) {
       // 1. Fetch min/max values
       const { min, max } = await fetchMinMax(formData);
   
       // 2. Calculate bin ranges
       const bins = calculateBins(min, max, formData.bins);
   
       // 3. Build the histogram query with server-side binning
       return buildQueryContext(formData, baseQueryObject => [{
         ...baseQueryObject,
         metrics: bins.map(bin => ({
           expressionType: 'SQL',
           sqlExpression: `SUM(CASE WHEN ${column} >= ${bin.start} AND 
${column} < ${bin.end} THEN 1 END)`,
           label: bin.label
         }))
       }]);
     }
   
     This would enable:
     - ✅ Metric filters (HAVING clause support)
     - ✅ True server-side binning (better performance for large datasets)
     - ✅ Dynamic bin calculation based on actual data range
     - ✅ Support for all database engines (no special SQL requirements)
   
     The async buildQuery support is backwards compatible and opens doors for 
other complex visualizations that need preliminary data to construct their 
queries.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to