weiqingy commented on PR #28218:
URL: https://github.com/apache/flink/pull/28218#issuecomment-4713119264

   The test does pass with and without the fix — but that's because it isn't 
actually reaching the bug, not because the bug is absent. The good news is it's 
real, and there's a clean way to pin it.
   
   The crash needs the sort collation to actually survive to physical planning. 
The JIRA's reproducer does that through the **Table API** — 
`src.order_by('b').select(col('a').max)` — which builds a real `LogicalSort` on 
field `b` underneath the global aggregate. But the test query is **SQL**: 
`SELECT MAX(a) FROM (SELECT a, b FROM MyTable ORDER BY b ASC)`. Calcite drops 
an `ORDER BY` in a derived table during SQL-to-rel conversion (no 
`FETCH`/`LIMIT` to make it meaningful), so the collation is gone before 
`FlinkExpandConversionRule` ever runs. You can see it in the test's own pinned 
AST — there's no `LogicalSort` in it. That's why the plan is identical with and 
without the fix.
   
   I checked this locally against current `master` (no fix):
   
   - Table API `from("MyTable").orderBy($("b").asc()).select($("a").max())` → 
reproduces the exact JIRA crash: `IndexOutOfBoundsException: index (1) must be 
less than size (1)` at `FlinkExpandConversionRule.satisfyTraitsBySelf` (same 
`RelExplainUtil.collationToString` → `BatchPhysicalSort.explainTerms` → 
`VolcanoPlanner.registerImpl` stack).
   - The SQL form → plans fine, no crash, with or without the fix.
   
   With the fix applied, the Table-API query plans cleanly (`HashAggregate ← 
Sort(orderBy=[b ASC])`), so it red-greens.
   
   The `verifyRelPlan` direction still holds — there's a `verifyRelPlan(Table)` 
overload, so you can keep the planner-only harness and just feed it the 
Table-API query instead of the SQL string. Something like this (in case it 
helps):
   
   ```java
   @Test
   void testOrderByWithGlobalAggregate() {
       Table t =
               util.tableEnv()
                       .from("MyTable")
                       .orderBy($("b").asc())
                       .select($("a").max());
       util.verifyRelPlan(t);
   }
   ```
   
   (with `import static org.apache.flink.table.api.Expressions.$;` and the 
existing `MyTable` DDL). Worth confirming red-green on your side too — it 
should fail on `master` without the change and pass with it, which also answers 
the "why do we need this PR" question: the bug is genuine, it just needs the 
Table-API shape to surface rather than the SQL form, where the inner `ORDER BY` 
is optimized away before the rule runs.
   
   WDYT? The rule change itself looks right to me — it's just the test that 
needs the Table-API shape to actually exercise it.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to