This is an automated email from the ASF dual-hosted git repository.
michaelsmith pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
The following commit(s) were added to refs/heads/master by this push:
new 45051a276 IMPALA-14551: Fix hang on Unicode complex expressions in
alias mapping
45051a276 is described below
commit 45051a27672c3a8a5be304ab2c71fc5c02946788
Author: woosuk.ro <[email protected]>
AuthorDate: Tue Dec 30 13:30:25 2025 +0000
IMPALA-14551: Fix hang on Unicode complex expressions in alias mapping
When a complex expression with Unicode letters is selected without an alias,
SelectListItem.toColumnLabel() falls back to expr.toSql().toLowerCase(), so
HiveLexer validates a backticked, lowercased toSql() alias and can loop
indefinitely on ANTLR 3.3 because Lexer.nextToken() never calls recover().
Root cause: ANTLR 3.3's Lexer.nextToken() does not call recover() for
RecognitionException, causing infinite retry on same input position.
Fix: Upgrade antlr-runtime from 3.3 to 3.5.3. In ANTLR 3.4+, recover()
calls input.consume() to advance past problematic characters.
Note: This issue does NOT occur with CDP Hive (which includes HIVE-19064),
but affects Apache Hive 3.1.3 without the patch. This fix ensures
compatibility with unpatched Hive versions.
Testing:
- Added testHiveNeedsQuotesUnicodeComplexExpression with timeout
Change-Id: I175de0c3cd958a03e5ca02590a8b84ca6e674f3d
Reviewed-on: http://gerrit.cloudera.org:8080/23812
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
---
fe/pom.xml | 2 +-
.../java/org/apache/impala/analysis/ToSqlUtilsTest.java | 16 ++++++++++++++++
2 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/fe/pom.xml b/fe/pom.xml
index 043c6fa3c..ae02f0437 100644
--- a/fe/pom.xml
+++ b/fe/pom.xml
@@ -244,7 +244,7 @@ under the License.
<dependency>
<groupId>org.antlr</groupId>
<artifactId>antlr-runtime</artifactId>
- <version>3.3</version>
+ <version>3.5.3</version>
</dependency>
<dependency>
diff --git a/fe/src/test/java/org/apache/impala/analysis/ToSqlUtilsTest.java
b/fe/src/test/java/org/apache/impala/analysis/ToSqlUtilsTest.java
index ca8ca4f4b..dea11be30 100644
--- a/fe/src/test/java/org/apache/impala/analysis/ToSqlUtilsTest.java
+++ b/fe/src/test/java/org/apache/impala/analysis/ToSqlUtilsTest.java
@@ -197,6 +197,22 @@ public class ToSqlUtilsTest extends FrontendTestBase {
// version for that.
}
+ // IMPALA-14551: Unicode complex expressions could cause HiveLexer to hang.
+ @Test(timeout = 5000)
+ public void testHiveNeedsQuotesUnicodeComplexExpression() {
+ // Use the lowercased, backticked alias string that toSql() would generate
to
+ // make sure HiveLexer doesn't hang.
+ // Korean complex expressions
+ assertTrue(ToSqlUtils.hiveNeedsQuotes(
+ "`매출액` - lag(`매출액`) over (partition by `x` order by `y`)"));
+ // Japanese complex expressions
+ assertTrue(ToSqlUtils.hiveNeedsQuotes(
+ "sum(`売上高`) over (order by `年月日` rows unbounded preceding)"));
+ // Chinese complex expressions
+ assertTrue(ToSqlUtils.hiveNeedsQuotes(
+ "avg(`销售额`) over (order by `日期` rows between 3 preceding and current
row)"));
+ }
+
@Test
public void testGetIdentSql() {
// Hive & Impala keyword