This is an automated email from the ASF dual-hosted git repository.

michaelsmith pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
     new 45051a276 IMPALA-14551: Fix hang on Unicode complex expressions in 
alias mapping
45051a276 is described below

commit 45051a27672c3a8a5be304ab2c71fc5c02946788
Author: woosuk.ro <[email protected]>
AuthorDate: Tue Dec 30 13:30:25 2025 +0000

    IMPALA-14551: Fix hang on Unicode complex expressions in alias mapping
    
    When a complex expression with Unicode letters is selected without an alias,
    SelectListItem.toColumnLabel() falls back to expr.toSql().toLowerCase(), so
    HiveLexer validates a backticked, lowercased toSql() alias and can loop
    indefinitely on ANTLR 3.3 because Lexer.nextToken() never calls recover().
    
    Root cause: ANTLR 3.3's Lexer.nextToken() does not call recover() for
    RecognitionException, causing infinite retry on same input position.
    
    Fix: Upgrade antlr-runtime from 3.3 to 3.5.3. In ANTLR 3.4+, recover()
    calls input.consume() to advance past problematic characters.
    
    Note: This issue does NOT occur with CDP Hive (which includes HIVE-19064),
    but affects Apache Hive 3.1.3 without the patch. This fix ensures
    compatibility with unpatched Hive versions.
    
    Testing:
    - Added testHiveNeedsQuotesUnicodeComplexExpression with timeout
    
    Change-Id: I175de0c3cd958a03e5ca02590a8b84ca6e674f3d
    Reviewed-on: http://gerrit.cloudera.org:8080/23812
    Reviewed-by: Impala Public Jenkins <[email protected]>
    Tested-by: Impala Public Jenkins <[email protected]>
---
 fe/pom.xml                                               |  2 +-
 .../java/org/apache/impala/analysis/ToSqlUtilsTest.java  | 16 ++++++++++++++++
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/fe/pom.xml b/fe/pom.xml
index 043c6fa3c..ae02f0437 100644
--- a/fe/pom.xml
+++ b/fe/pom.xml
@@ -244,7 +244,7 @@ under the License.
     <dependency>
       <groupId>org.antlr</groupId>
       <artifactId>antlr-runtime</artifactId>
-      <version>3.3</version>
+      <version>3.5.3</version>
     </dependency>
 
     <dependency>
diff --git a/fe/src/test/java/org/apache/impala/analysis/ToSqlUtilsTest.java 
b/fe/src/test/java/org/apache/impala/analysis/ToSqlUtilsTest.java
index ca8ca4f4b..dea11be30 100644
--- a/fe/src/test/java/org/apache/impala/analysis/ToSqlUtilsTest.java
+++ b/fe/src/test/java/org/apache/impala/analysis/ToSqlUtilsTest.java
@@ -197,6 +197,22 @@ public class ToSqlUtilsTest extends FrontendTestBase {
     // version for that.
   }
 
+  // IMPALA-14551: Unicode complex expressions could cause HiveLexer to hang.
+  @Test(timeout = 5000)
+  public void testHiveNeedsQuotesUnicodeComplexExpression() {
+    // Use the lowercased, backticked alias string that toSql() would generate 
to
+    // make sure HiveLexer doesn't hang.
+    // Korean complex expressions
+    assertTrue(ToSqlUtils.hiveNeedsQuotes(
+        "`매출액` - lag(`매출액`) over (partition by `x` order by `y`)"));
+    // Japanese complex expressions
+    assertTrue(ToSqlUtils.hiveNeedsQuotes(
+        "sum(`売上高`) over (order by `年月日` rows unbounded preceding)"));
+    // Chinese complex expressions
+    assertTrue(ToSqlUtils.hiveNeedsQuotes(
+        "avg(`销售额`) over (order by `日期` rows between 3 preceding and current 
row)"));
+  }
+
   @Test
   public void testGetIdentSql() {
     // Hive & Impala keyword

Reply via email to