featzhang opened a new pull request, #27750: URL: https://github.com/apache/flink/pull/27750
Implemented CSE optimization for Flink SQL code generation. Key changes: **New files:** - `CseUtils.scala` - Utility to scan RexNode trees and identify duplicate sub-expressions - `CseITCase.scala` - Integration tests for CSE (correctness, call count, null handling, nested) - `CseTestFunctions.java` - Test UDFs with AtomicInteger call counters - `CseUtilsTest.scala` - Unit tests for the CSE analyzer **Modified files:** - `ExprCodeGenerator.scala` - Added `cseEnabled`, `cseExprCache`, `cseCandidates` fields and CSE logic in `generateExpression()`. Also changed operand visiting in `visitCall` to go through `generateExpression` (not `accept(this)`) so nested sub-expressions are also cached. - `CalcCodeGenerator.scala` - Added CSE analysis before projection code generation using `CseUtils.findDuplicateSubExpressions()` **Approach:** RexNode digest string is used as expression identity key. Only RexCall nodes appearing more than once are cached. First encounter generates code + stores in local variable. Subsequent encounters return NO_CODE reference to cached variable. **Test results (all passing):** - `CseJavaITCase`: 5/5 ✅ (testCseCorrectness, testCseCallCount, testNoCseCandidates, testCseWithNullValues, testCseWithNestedCommonSubExpressions) - `CseUtilsTest`: 6/6 ✅ > Replaces #27747 (closed due to force-push after author correction) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
