This is an automated email from the ASF dual-hosted git repository.
csringhofer pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
The following commit(s) were added to refs/heads/master by this push:
new 81f267388 IMPALA-889: Add trim() function matching ANSI SQL definition
81f267388 is described below
commit 81f2673883f65aa71f682bf9fda6dd73888e75a8
Author: Mihaly Szjatinya <[email protected]>
AuthorDate: Sun Dec 1 20:23:28 2024 +0100
IMPALA-889: Add trim() function matching ANSI SQL definition
As agreed in JIRA discussions, the current PR extends existing TRIM
functionality with the support of SQL-standardized TRIM-FROM syntax:
TRIM({[LEADING / TRAILING / BOTH] | [STRING characters]} FROM expr).
Implemented based on the existing LTRIM / RTRIM / BTRIM family of
functions prepared earlier in IMPALA-6059 and extended for UTF-8 in
IMPALA-12718. Besides, partly based on abandoned PR
https://gerrit.cloudera.org/#/c/4474 and similar EXTRACT-FROM
functionality from https://github.com/apache/impala/commit/543fa73f3a846
f0e4527514c993cb0985912b06c.
Supported syntaxes:
Syntax #1 TRIM(<where> FROM <string>);
Syntax #2 TRIM(<charset> FROM <string>);
Syntax #3 TRIM(<where> <charset> FROM <string>);
"where": Case-insensitive trim direction. Valid options are "leading",
"trailing", and "both". "leading" means trimming characters from the
start; "trailing" means trimming characters from the end; "both" means
trimming characters from both sides. For Syntax #2, since no "where"
is specified, the option "both" is implied by default.
"charset": Case-sensitive characters to be removed. This argument is
regarded as a character set going to be removed. The occurrence order
of each character doesn't matter and duplicated instances of the same
character will be ignored. NULL argument implies " " (standard space)
by default. Empty argument ("" or '') makes TRIM return the string
untouched. For Syntax #1, since no "charset" is specified, it trims
" " (standard space) by default.
"string": Case-sensitive target string to trim. This argument can be
NULL.
The UTF8_MODE query option is honored by TRIM-FROM, similarly to
existing TRIM().
UTF8_TRIM-FROM can be used to force UTF8 mode regardless of the query
option.
Design Notes:
1. No-BE. Since the existing LTRIM / RTRIM / BTRIM functions fully cover
all needed use-cases, no backend logic is required. This differs from
similar EXTRACT-FROM.
2. Syntax wrapper. TrimFromExpr class was introduced as a syntax
wrapper around FunctionCallExpr, which instantiates one of the regular
LTRIM / RTRIM / BTRIM functions. TrimFromExpr's role is to maintain
the integrity of the "phantom" TRIM-FROM built-in function.
3. No TRIM keyword. Following EXTRACT-FROM, no "TRIM" keyword was
added to the language. Although generally a keyword would allow easier
and better parsing, on the negative side it restricts token's usage in
general context. However, leading/trailing/both, being previously
saved as reserved words, are now added as keywords to make possible
their usage with no escaping.
Change-Id: I3c4fa6d0d8d0684c4b6d8dac8fd531d205e4f7b4
Reviewed-on: http://gerrit.cloudera.org:8080/21825
Reviewed-by: Csaba Ringhofer <[email protected]>
Tested-by: Csaba Ringhofer <[email protected]>
---
be/src/exprs/expr-test.cc | 57 ++++++
docs/topics/impala_string_functions.xml | 27 ++-
fe/src/main/cup/sql-parser.cup | 50 +++++-
.../org/apache/impala/analysis/TrimFromExpr.java | 193 +++++++++++++++++++++
fe/src/main/jflex/sql-scanner.flex | 3 +
.../apache/impala/analysis/AnalyzeExprsTest.java | 61 ++++++-
.../java/org/apache/impala/analysis/ToSqlTest.java | 7 +
.../queries/QueryTest/utf8-string-functions.test | 182 +++++++++++++++++++
8 files changed, 571 insertions(+), 9 deletions(-)
diff --git a/be/src/exprs/expr-test.cc b/be/src/exprs/expr-test.cc
index f987eed9e..f186fe1bd 100644
--- a/be/src/exprs/expr-test.cc
+++ b/be/src/exprs/expr-test.cc
@@ -4765,6 +4765,63 @@ TEST_P(ExprTest, StringFunctions) {
TestStringValue("btrim('æeioü','æü')", "eio");
TestStringValue("btrim('\\\\abcdefg\\\\', 'ag\\\\')", "bcdef");
+ /// Test cases for TRIM UDF
+ // TRIM(where FROM string): trim space by default;
+ TestStringValue("trim(leading FROM ' 1 2 3 4 5 ')", "1 2 3 4 5 ");
+ TestStringValue("trim(leAdiNG FrOm ' 212 ')", "212 ");
+ TestStringValue("trim(leading from 'blackhole')", "blackhole");
+ TestStringValue("trim(trailing FroM ' 13 1 3 4 6 1 24 ')", " 13 1 3 4 6
1 24");
+ TestStringValue("trim(tRAIlinG fROm ' 1a 9b 4c 5d ')", " 1a 9b 4c 5d");
+ TestStringValue("trim(trailing from ' abcdefg ')", " abcdefg");
+ TestStringValue("trim(bOTh frOm ' ab c d e f 1234 ')", "ab c d e f 1234");
+ TestStringValue("trim(both from ' aaaaaaaaa ')", "aaaaaaaaa");
+ TestStringValue("trim(both from 'pulsar')", "pulsar");
+
+ // TRIM(string FROM string): trim from both sides by default;
+ TestStringValue("trim('1234' FroM '23471364134612413434')", "713641346");
+ TestStringValue("trim('abc' from 'abacdefg')", "defg");
+ TestStringValue("trim('xyz' fROm 'abcdabcdabc')", "abcdabcdabc");
+ TestStringValue("trim('aaaaabbbbbccccccccffffffffg' from 'abcdefg')", "de");
+ TestStringValue("trim('' fROM ' helloworld ')", " helloworld ");
+ TestStringValue("trim(' ' from ' helloworld ')", "helloworld");
+ TestStringValue("trim(NULL FROM ' helloworld ')", " helloworld ");
+ TestStringValue("trim('both' FROM 'boneth')", "ne");
+
+ // TRIM(where string FROM string): regular test cases
+ // leading/trailing
+ TestStringValue("trim(leading 'rt' froM 'rrrrssssstttttt')", "ssssstttttt");
+ TestStringValue("trim(trailing '1234' FroM '23471364134612413434')",
"234713641346");
+ TestStringValue("trim(TRAILING 'a0' from '0ac2aa0aa00000a')", "0ac2");
+ TestStringValue("trim(trailing 'rt' FROm 'rrrrssssstttttt')", "rrrrsssss");
+ TestStringValue("trim(tRAIlinG 'rt' FROM 'rrrrssssstttttt')", "rrrrsssss");
+ // both
+ TestStringValue("trim(both 'aaaaaaaaaaaaaaaaaaaaabg' from 'abcdefg')",
"cdef");
+ TestStringValue("trim(bOTh 'a' from 'aaaaaaaaa')", "");
+ // Empty source strings return empty results
+ TestStringValue("trim(leading 'abc' from '')", "");
+ TestStringValue("trim(trailing 'xyzabcrst' from '')", "");
+ TestStringValue("trim(both 'aeiou' from '')", "");
+ // Null source strings return null results
+ TestIsNull("trim(leading '' from NULL)", TYPE_STRING);
+ TestIsNull("trim(trailing '' from NULL)", TYPE_STRING);
+ TestIsNull("trim(both 'aeiou' from NULL)", TYPE_STRING);
+ TestIsNull("trim(BoTh 'abc' from NULL)", TYPE_STRING);
+ // Special cases for string_expr nonterminal
+ TestStringValue("trim(both 'ao' from 'aaYYBBoo')", "YYBB");
+ TestStringValue("trim(both from ' YYBB ')", "YYBB");
+ TestStringValue("trim(+'aaoo' from 'aaYYBBoo')", "YYBB");
+ TestStringValue("trim(upper('a') from 'AAYYBB')", "YYBB");
+ TestStringValue(
+ "trim(replace('hello world', 'world', 'earth') from 'aaYYBBoo')",
+ "YYBB");
+ TestStringValue("trim(left('12ao5BY',5) from 'aaYYBBoo')", "YYBB");
+ TestStringValue("trim(right('BY54ao1',5) from 'aaYYBBoo')", "YYBB");
+ TestStringValue("trim(if((1=2) ,NULL, 'ao') from 'aaYYBBoo')", "YYBB");
+ TestStringValue("trim(cast(12.45 as string) from '1YYBB2')", "YYBB");
+ TestStringValue(
+ "trim(case when 1 = 2 then NULL else 'ao' end from 'aaYYBBoo')",
+ "YYBB");
+
TestStringValue("space(0)", "");
TestStringValue("space(-1)", "");
TestStringValue("space(cast(1 as bigint))", " ");
diff --git a/docs/topics/impala_string_functions.xml
b/docs/topics/impala_string_functions.xml
index 5772911ee..59ae62b69 100644
--- a/docs/topics/impala_string_functions.xml
+++ b/docs/topics/impala_string_functions.xml
@@ -1805,7 +1805,7 @@ select replace('hello world','xyz','abc');
<dlentry id="trim">
<dt>
- TRIM(STRING a)
+ TRIM(STRING a), TRIM({{LEADING | TRAILING | BOTH} [STRING
chars_to_trim] | STRING chars_to_trim} FROM STRING a)
</dt>
<dd>
@@ -1819,6 +1819,31 @@ select replace('hello world','xyz','abc');
besides spaces, see <codeph>BTRIM()</codeph>.
</p>
+ <p>
+ TRIM-FROM syntax is a SQL-standardized wrapper around
+ <codeph>LTRIM</codeph> / <codeph>RTRIM</codeph> /
<codeph>BTRIM</codeph>.
+ Depending on the first parameter <codeph>LEADING</codeph> /
+ <codeph>TRAILING</codeph> / <codeph>BOTH</codeph>, wrapper
resolves itself to
+ the corresponding underlying function. Default value is
<codeph>BOTH</codeph>.
+ If present, the <codeph>chars_to_trim</codeph> parameter is passed
forward to
+ the underlying function. Thus, given syntax may come in three
different forms:
+ </p>
+
+ <p>
+ Syntax #1: <codeph>TRIM(<where> FROM
<string>)</codeph> -->
+ <codeph>{L|R|B}TRIM(string)</codeph>;
+ </p>
+
+ <p>
+ Syntax #2: <codeph>TRIM(<charset> FROM
<string>)</codeph> -->
+ <codeph>BTRIM(string, charset)</codeph>;
+ </p>
+
+ <p>
+ Syntax #3: <codeph>TRIM(<where> <charset> FROM
<string>)</codeph>
+ --><codeph>{L|R|B}TRIM(string, charset)</codeph>.
+ </p>
+
<p>
<b>Return type:</b> <codeph>STRING</codeph>
</p>
diff --git a/fe/src/main/cup/sql-parser.cup b/fe/src/main/cup/sql-parser.cup
index 34e4b771a..ab417fd79 100755
--- a/fe/src/main/cup/sql-parser.cup
+++ b/fe/src/main/cup/sql-parser.cup
@@ -303,7 +303,7 @@ parser code {:
terminal
KW_ADD, KW_AGGREGATE, KW_ALL, KW_ALTER, KW_ANALYTIC, KW_AND, KW_ANTI,
KW_API_VERSION,
KW_ARRAY, KW_AS, KW_ASC, KW_AUTHORIZATION, KW_AVRO, KW_BETWEEN, KW_BIGINT,
KW_BINARY,
- KW_BLOCKSIZE, KW_BOOLEAN, KW_BY, KW_CACHED, KW_CASCADE, KW_CASE, KW_CAST,
KW_CHANGE,
+ KW_BLOCKSIZE, KW_BOOLEAN, KW_BOTH, KW_BY, KW_CACHED, KW_CASCADE, KW_CASE,
KW_CAST, KW_CHANGE,
KW_CHAR, KW_CLASS, KW_CLOSE_FN, KW_COLUMN, KW_COLUMNS, KW_COMMENT,
KW_COMPRESSION,
KW_COMPUTE, KW_CONSTRAINT, KW_CONVERT, KW_COPY, KW_CREATE, KW_CROSS,
KW_CUBE, KW_CURRENT,
KW_DATA, KW_DATABASE, KW_DATABASES, KW_DATE, KW_DATETIME, KW_DECIMAL,
KW_DEFAULT, KW_DELETE,
@@ -315,8 +315,8 @@ terminal
KW_GROUP, KW_GROUPING, KW_HASH, KW_HUDIPARQUET, KW_IGNORE, KW_HAVING,
KW_ICEBERG, KW_IF,
KW_ILIKE, KW_IN, KW_INCREMENTAL, KW_INIT_FN, KW_INNER, KW_INPATH, KW_INSERT,
KW_INT,
KW_INTERMEDIATE, KW_INTERSECT, KW_INTERVAL, KW_INTO, KW_INVALIDATE,
KW_IREGEXP, KW_IS,
- KW_JDBC, KW_JOIN, KW_JSONFILE, KW_KUDU, KW_LAST, KW_LEFT, KW_LEXICAL,
KW_LIKE, KW_LIMIT,
- KW_LINES, KW_LOAD, KW_LOCATION, KW_LOGICAL_OR, KW_MANAGED_LOCATION, KW_MAP,
KW_MATCHED,
+ KW_JDBC, KW_JOIN, KW_JSONFILE, KW_KUDU, KW_LAST, KW_LEADING, KW_LEFT,
KW_LEXICAL, KW_LIKE,
+ KW_LIMIT, KW_LINES, KW_LOAD, KW_LOCATION, KW_LOGICAL_OR,
KW_MANAGED_LOCATION, KW_MAP, KW_MATCHED,
KW_MERGE, KW_MERGE_FN, KW_METADATA, KW_MINUS, KW_NON, KW_NORELY, KW_NOT,
KW_NOVALIDATE, KW_NULL, KW_NULLS, KW_OF, KW_OFFSET, KW_ON, KW_OPTIMIZE,
KW_OR,
KW_ORC, KW_ORDER, KW_OUTER,
@@ -330,7 +330,7 @@ terminal
KW_STORAGE_HANDLER_URI, KW_STORED, KW_STRAIGHT_JOIN, KW_STRING, KW_STRUCT,
KW_SYMBOL,
KW_SYSTEM_TIME, KW_SYSTEM_VERSION, KW_TABLE, KW_TABLES, KW_TABLESAMPLE,
KW_TBLPROPERTIES,
KW_TERMINATED, KW_TEXTFILE, KW_THEN, KW_TIMESTAMP, KW_TINYINT, KW_TRUNCATE,
KW_STATS,
- KW_TO, KW_TRUE, KW_UDF, KW_UNBOUNDED, KW_UNCACHED, KW_UNION, KW_UNIQUE,
KW_UNKNOWN,
+ KW_TO, KW_TRAILING, KW_TRUE, KW_UDF, KW_UNBOUNDED, KW_UNCACHED, KW_UNION,
KW_UNIQUE, KW_UNKNOWN,
KW_UNNEST, KW_UNSET, KW_UPDATE, KW_UPDATE_FN, KW_UPSERT, KW_USE, KW_USING,
KW_VALIDATE,
KW_VALUES, KW_VARCHAR, KW_VIEW, KW_VIEWS, KW_WHEN, KW_WHERE, KW_WITH,
KW_ZORDER;
@@ -491,6 +491,7 @@ nonterminal PartitionKeyValue partition_key_value;
nonterminal PartitionKeyValue static_partition_key_value;
nonterminal Qualifier opt_set_op_qualifier;
nonterminal SetOperator set_op;
+nonterminal TrimFromExpr.TrimOption trim_option;
// For ALTER DATABASE.
nonterminal AlterDbStmt alter_db_stmt;
@@ -3841,6 +3842,15 @@ non_pred_expr ::=
{: RESULT = s; :}
;
+trim_option ::=
+ KW_BOTH
+ {: RESULT = TrimFromExpr.TrimOption.BOTH; :}
+ | KW_LEADING
+ {: RESULT = TrimFromExpr.TrimOption.LEADING; :}
+ | KW_TRAILING
+ {: RESULT = TrimFromExpr.TrimOption.TRAILING; :}
+ ;
+
function_call_expr ::=
function_name:fn_name LPAREN RPAREN
{:
@@ -3849,9 +3859,29 @@ function_call_expr ::=
:}
| function_name:fn_name LPAREN function_params:params RPAREN
{: RESULT = FunctionCallExpr.createExpr(fn_name, params,
parser.getQueryOptions()); :}
- // Below is a special case for EXTRACT. Idents are used to avoid adding new
keywords.
- | function_name:fn_name LPAREN ident_or_default:u KW_FROM expr:t RPAREN
- {: RESULT = new ExtractFromExpr(fn_name, u, t); :}
+ // Below are special cases for function calls with KW_FROM inside. Idents
are used to
+ // avoid adding new keywords.
+ // Common syntax for "EXTRACT(unit FROM TIMESTAMP/DATE ts)"
+ // and "TRIM(where FROM string)":
+ | function_name:fn_name LPAREN ident_or_default:i KW_FROM expr:e RPAREN
+ {:
+ if (fn_name.toString().toLowerCase().endsWith("extract")) {
+ RESULT = new ExtractFromExpr(fn_name, i, e);
+ } else {
+ // Special case for "TRIM(<charset> FROM <string>)", where charset is a
simple
+ // slot reference.
+ RESULT = new TrimFromExpr(fn_name, i, e);
+ }
+ :}
+ // "TRIM(<where> FROM <string>)":
+ | function_name:fn_name LPAREN trim_option:w KW_FROM expr:s RPAREN
+ {: RESULT = new TrimFromExpr(fn_name, w, s); :}
+ // Syntax: "TRIM(<charset> FROM <string>)":
+ | function_name:fn_name LPAREN expr:c KW_FROM expr:s RPAREN
+ {: RESULT = new TrimFromExpr(fn_name, c, s); :}
+ // Syntax: "TRIM(<where> <charset> FROM <string>)":
+ | function_name:fn_name LPAREN trim_option:w expr:c KW_FROM expr:s RPAREN
+ {: RESULT = new TrimFromExpr(fn_name, w, c, s); :}
;
// TODO: allow an arbitrary expr here instead of agg/fn call, and check during
analysis?
@@ -4315,6 +4345,8 @@ word ::=
{: RESULT = r.toString(); :}
| KW_BOOLEAN:r
{: RESULT = r.toString(); :}
+ | KW_BOTH:r
+ {: RESULT = r.toString(); :}
| KW_BUCKETS:r
{: RESULT = r.toString(); :}
| KW_BY:r
@@ -4509,6 +4541,8 @@ word ::=
{: RESULT = r.toString(); :}
| KW_LAST:r
{: RESULT = r.toString(); :}
+ | KW_LEADING:r
+ {: RESULT = r.toString(); :}
| KW_LEFT:r
{: RESULT = r.toString(); :}
| KW_LEXICAL:r
@@ -4697,6 +4731,8 @@ word ::=
{: RESULT = r.toString(); :}
| KW_TINYINT:r
{: RESULT = r.toString(); :}
+ | KW_TRAILING:r
+ {: RESULT = r.toString(); :}
| KW_TRUNCATE:r
{: RESULT = r.toString(); :}
| KW_STATS:r
diff --git a/fe/src/main/java/org/apache/impala/analysis/TrimFromExpr.java
b/fe/src/main/java/org/apache/impala/analysis/TrimFromExpr.java
new file mode 100644
index 000000000..c2bc99b6b
--- /dev/null
+++ b/fe/src/main/java/org/apache/impala/analysis/TrimFromExpr.java
@@ -0,0 +1,193 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.impala.analysis;
+
+import com.google.common.collect.Lists;
+import com.google.common.base.Enums;
+
+import org.apache.impala.catalog.BuiltinsDb;
+import org.apache.impala.catalog.Type;
+import org.apache.impala.common.AnalysisException;
+
+import java.util.stream.Collectors;
+import java.util.Arrays;
+
+/**
+ * Representation of the following TRIM expressions:
+ * TRIM(<Where> FROM <String_to_be_trimmed>)
+ * TRIM(<Characters> FROM <String_to_be_trimmed>)
+ * TRIM(<Where> <Characters> FROM <String_to_be_trimmed>)
+ * Such forms of TRIM are not handled by FunctionCallExpr.
+ *
+ * Given class serves as a syntax wrapper around FunctionCallExpr, which
instantiates
+ * a corresponding *trim function with regular syntax based on the arguments.
+ */
+public class TrimFromExpr extends FunctionCallExpr {
+ private final FunctionName trimFromFnName_;
+ private final TrimOption where_;
+ private final Expr charset_;
+ private final Expr srcExpr_;
+ private SlotRef slotRef_ = null;
+
+ // Trim option set.
+ public enum TrimOption {
+ LEADING,
+ TRAILING,
+ BOTH
+ }
+
+ private static FunctionName baseFnName(FunctionName fnName, TrimOption
where) {
+ String fn = fnName.toString();
+ if (where == null) {
+ // Default to BOTH if where is null
+ where = TrimOption.BOTH;
+ }
+ switch (where) {
+ case LEADING:
+ fn = fn.replaceAll("trim", "ltrim");
+ break;
+ case TRAILING:
+ fn = fn.replaceAll("trim", "rtrim");
+ break;
+ case BOTH:
+ default:
+ fn = fn.replaceAll("trim", "btrim");
+ break;
+ }
+ return new FunctionName(fn);
+ }
+
+ // For "TRIM(<where> FROM <string>)"
+ public TrimFromExpr(FunctionName fnName, TrimOption where, Expr srcexpr) {
+ this(fnName, where, null, srcexpr);
+ }
+
+ // For "TRIM(<charset> FROM <string>)"
+ public TrimFromExpr(FunctionName fnName, Expr charset, Expr srcexpr) {
+ this(fnName, null, charset, srcexpr);
+ }
+
+ // Special case for "TRIM(<charset> FROM <string>)": charset is a simple
slot reference.
+ public TrimFromExpr(FunctionName fnName, String slot, Expr srcexpr) {
+ this(fnName, null, null, srcexpr);
+ slotRef_ = new SlotRef(Arrays.asList(slot));
+ }
+
+ // For generic "TRIM(<where> <charset> FROM <string>)"
+ public TrimFromExpr(FunctionName fnName, TrimOption where, Expr charset,
Expr srcexpr) {
+ super(baseFnName(fnName, where),
+ charset == null ? Lists.newArrayList(srcexpr) :
+ Lists.newArrayList(srcexpr, charset));
+ trimFromFnName_ = fnName;
+ where_ = where;
+ charset_ = charset;
+ srcExpr_ = srcexpr;
+ type_ = Type.STRING;
+ }
+
+ /**
+ * Copy c'tor used in clone().
+ */
+ protected TrimFromExpr(TrimFromExpr other) {
+ super(other);
+ trimFromFnName_ = other.trimFromFnName_;
+ where_ = other.where_;
+ charset_ = other.charset_;
+ srcExpr_ = other.srcExpr_;
+ slotRef_ = other.slotRef_;
+ }
+
+ @Override
+ protected void analyzeImpl(Analyzer analyzer) throws AnalysisException {
+ // Do these sanity checks before calling the super class to get the
expected error
+ // messages if trim() is used in an invalid way.
+ trimFromFnName_.analyze(analyzer);
+ if (!trimFromFnName_.getFunction().equals("trim")
+ && !trimFromFnName_.getFunction().equals("utf8_trim")) {
+ throw new AnalysisException("Function "
+ + trimFromFnName_.getFunction().toUpperCase()
+ + " does not accept the keyword FROM.");
+ }
+ if ((trimFromFnName_.getDb() != null)
+ && !trimFromFnName_.getDb().equals(BuiltinsDb.NAME)) {
+ throw new AnalysisException("Function "
+ + trimFromFnName_.toString() + " conflicts "
+ + "with the TRIM builtin.");
+ }
+
+ super.analyzeImpl(analyzer);
+
+ if (slotRef_ != null) {
+ try {
+ slotRef_.analyze(analyzer);
+ } catch (AnalysisException e) {
+ // In case 'where' parameter cannot be interpreted as a slot, trim
option error
+ // is more informative.
+ throw new AnalysisException("Trim option '" + slotRef_.toSql()
+ + "' in expression '" + toSql() + "' is invalid. Expected one of: "
+ + Arrays.stream(TrimOption.values()).map(option -> option.name())
+ .collect(Collectors.joining(", "))
+ + ". Note: TRIM-FROM syntax also accepts a column name identifier
as a "
+ + "charset argument.");
+ }
+ if (super.getParams().size() == 1) {
+ super.getParams().exprs().add(slotRef_);
+ super.children_.add(slotRef_);
+ }
+ }
+ }
+
+ @Override
+ protected String getFunctionNotFoundError(Type[] argTypes) {
+ StringBuilder errMsg = new StringBuilder();
+ if (charset_ != null && charset_.getType() != Type.STRING) {
+ errMsg.append("Expression '" + charset_.toSql() + "' has a return type
of "
+ + charset_.getType().toSql() + " but a STRING is
required.");
+ }
+ if (slotRef_ != null && slotRef_.getType() != Type.STRING) {
+ errMsg.append("Expression '" + slotRef_.toSql() + "' has a return type
of "
+ + slotRef_.getType().toSql() + " but a STRING is
required.");
+ }
+ if (srcExpr_ != null && srcExpr_.getType() != Type.STRING) {
+ if (errMsg.length() > 0) errMsg.append(" ");
+ errMsg.append("Expression '" + srcExpr_.toSql() + "' has a return type
of "
+ + srcExpr_.getType().toSql() + " but a STRING is
required.");
+ }
+ return errMsg.toString();
+ }
+
+ @Override
+ public String toSqlImpl(ToSqlOptions options) {
+ StringBuilder strBuilder = new StringBuilder();
+ strBuilder.append(trimFromFnName_ + "(");
+ if (where_ != null) {
+ strBuilder.append(where_.name() + " ");
+ }
+ if (slotRef_ != null) {
+ strBuilder.append(slotRef_.toSql(options) + " ");
+ }
+ if (charset_ != null) {
+ strBuilder.append(charset_.toSql(options) + " ");
+ }
+ strBuilder.append("FROM " + srcExpr_.toSql(options) + ")");
+ return strBuilder.toString();
+ }
+
+ @Override
+ public Expr clone() { return new TrimFromExpr(this); }
+}
diff --git a/fe/src/main/jflex/sql-scanner.flex
b/fe/src/main/jflex/sql-scanner.flex
index 10856e2a3..bfbc3c72f 100644
--- a/fe/src/main/jflex/sql-scanner.flex
+++ b/fe/src/main/jflex/sql-scanner.flex
@@ -81,6 +81,7 @@ import org.apache.impala.thrift.TReservedWordsVersion;
keywordMap.put("bigint", SqlParserSymbols.KW_BIGINT);
keywordMap.put("binary", SqlParserSymbols.KW_BINARY);
keywordMap.put("block_size", SqlParserSymbols.KW_BLOCKSIZE);
+ keywordMap.put("both", SqlParserSymbols.KW_BOTH);
keywordMap.put("boolean", SqlParserSymbols.KW_BOOLEAN);
keywordMap.put("buckets", SqlParserSymbols.KW_BUCKETS);
keywordMap.put("by", SqlParserSymbols.KW_BY);
@@ -180,6 +181,7 @@ import org.apache.impala.thrift.TReservedWordsVersion;
keywordMap.put("jsonfile", SqlParserSymbols.KW_JSONFILE);
keywordMap.put("kudu", SqlParserSymbols.KW_KUDU);
keywordMap.put("last", SqlParserSymbols.KW_LAST);
+ keywordMap.put("leading", SqlParserSymbols.KW_LEADING);
keywordMap.put("left", SqlParserSymbols.KW_LEFT);
keywordMap.put("lexical", SqlParserSymbols.KW_LEXICAL);
keywordMap.put("like", SqlParserSymbols.KW_LIKE);
@@ -276,6 +278,7 @@ import org.apache.impala.thrift.TReservedWordsVersion;
keywordMap.put("then", SqlParserSymbols.KW_THEN);
keywordMap.put("timestamp", SqlParserSymbols.KW_TIMESTAMP);
keywordMap.put("tinyint", SqlParserSymbols.KW_TINYINT);
+ keywordMap.put("trailing", SqlParserSymbols.KW_TRAILING);
keywordMap.put("to", SqlParserSymbols.KW_TO);
keywordMap.put("true", SqlParserSymbols.KW_TRUE);
keywordMap.put("truncate", SqlParserSymbols.KW_TRUNCATE);
diff --git a/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
b/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
index a8a577202..5c11f4b38 100644
--- a/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
+++ b/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
@@ -1902,7 +1902,7 @@ public class AnalyzeExprsTest extends AnalyzerTest {
"functional_orc_def.complextypes_structs",
"No matching function with signature: lower(STRUCT<b:BOOLEAN>).");
- // Special cases for FROM in function call
+ // Special cases for FROM in EXTRACT function call
AnalyzesOk("select extract(year from now())");
AnalyzesOk("select extract(year from cast(now() as date))");
AnalyzesOk("select extract(year from date_col) from functional.date_tbl");
@@ -1924,6 +1924,65 @@ public class AnalyzeExprsTest extends AnalyzerTest {
AnalysisError("select date_part(year from now())",
"Function DATE_PART does not accept the keyword FROM");
+ // Special cases for FROM in TRIM in function call
+ // TRIM(where FROM string): trim space by default
+ AnalyzesOk("select trim(trailing FROM ' &@&127+ &@ ')");
+ AnalyzesOk("select tRIm(trailing FROM ' &@&127+ &@ ')");
+ AnalyzesOk("select utf8_trim(trailing FROM ' &@&127+ &@ ')");
+ AnalysisError("select trim(foo from ' &@&127+ &@ ')",
+ "Trim option 'foo' in expression 'trim(foo FROM ' &@&127+ &@
')' is " +
+ "invalid. Expected one of: LEADING, TRAILING, BOTH.");
+ AnalysisError("select utf8_trim(foo from ' &@&127+ &@ ')",
+ "Trim option 'foo' in expression 'utf8_trim(foo FROM ' &@&127+ &@
')' " +
+ "is invalid. Expected one of: LEADING, TRAILING, BOTH.");
+ AnalysisError("select trim(leading from 0)",
+ "Expression '0' has a return type of TINYINT but a STRING is
required.");
+ AnalysisError("select date_part(both from ' xyz ')",
+ "Function DATE_PART does not accept the keyword FROM.");
+
+ // TRIM(string FROM string): trim from both sides by default
+ AnalyzesOk("select trim('+-*' from '&$^$)*(^*^++--*')");
+ AnalyzesOk("select trim(NULL from '&$^$)*(^*^++--*')");
+ AnalyzesOk("select trim('' from '&$^$)*(^*^++--*')");
+ AnalysisError("select trim('cosmos' from 10)",
+ "Expression '10' has a return type of TINYINT but a STRING is
required.");
+ AnalysisError("select trim(100 from ' universe ');",
+ "Expression '100' has a return type of TINYINT but a STRING is
required.");
+ AnalysisError("select trim(3.1415926 from 2.718281828);",
+ "Expression '3.1415926' has a return type of DECIMAL(8,7) but a STRING
is " +
+ "required. Expression '2.718281828' has a return type of DECIMAL(10,9)
but a " +
+ "STRING is required.");
+ AnalysisError("select trim(int_col from 'abc') from functional.alltypes",
+ "Expression 'int_col' has a return type of INT but a STRING is
required.");
+
+ // TRIM(where string FROM string): regular test cases
+ AnalyzesOk("select trim(trailing 'a0-' from 'c2aa0a+&$%-a00000-a')");
+ AnalyzesOk("select trim(leAdiNG 'rt' From 'rrrrssssstttttt')");
+ AnalyzesOk("select trim(tRAIlinG 'rt' FROM 'rrrrssssstttttt')");
+ AnalysisError("select trim(xyz ' ' from now())",
+ "Syntax error in line 1:\n" +
+ "select trim(xyz ' ' from now())\n" +
+ " ^\n" +
+ "Encountered: STRING LITERAL\n" +
+ "Expected: AND, BETWEEN, DIV, FROM, IGNORE, ILIKE, IN, IREGEXP, IS,
LIKE, ||, " +
+ "NOT, OR, REGEXP, RLIKE, COMMA\n\n");
+ AnalysisError("select trim(both ' ' from now())",
+ "Expression 'now()' has a return type of TIMESTAMP but a STRING is
required.");
+ AnalysisError("select trim(xyz ' ' from ' Quantum ')",
+ "Syntax error in line 1:\n" +
+ "select trim(xyz ' ' from ' Quantum ')\n" +
+ " ^\n" +
+ "Encountered: STRING LITERAL\n" +
+ "Expected: AND, BETWEEN, DIV, FROM, IGNORE, ILIKE, IN, IREGEXP, IS,
LIKE, ||, " +
+ "NOT, OR, REGEXP, RLIKE, COMMA\n\n");
+ AnalysisError("select trim(both 6.022140857E23 from ' Avogadro constant
')",
+ "Expression '6.022140857E+23' has a return type of DECIMAL(24,0) but a
STRING " +
+ "is required.");
+ AnalysisError("select trim(both 1.6E-19 from 6.62606896E-34)",
+ "Expression '1.6E-19' has a return type of DECIMAL(20,20) but a STRING
is " +
+ "required. Expression '6.62606896E-34' has a return type of DOUBLE but
a " +
+ "STRING is required.");
+
// IGNORE NULLS may only be used with first_value/last_value
AnalysisError("select lower('FOO' ignore nulls)",
"Function LOWER does not accept the keyword IGNORE NULLS.");
diff --git a/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
b/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
index d17465f81..4ef81b33e 100644
--- a/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
+++ b/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
@@ -1507,6 +1507,13 @@ public class ToSqlTest extends FrontendTestBase {
// FunctionCallExpr.
testToSql("select pi(), (pi()), trim('a'), (trim('a'))",
"SELECT pi(), (pi()), trim('a'), (trim('a'))");
+ // TrimFromExpr.
+ testToSql("select trim('aa' from 'abc'), (trim('aa' from 'abc')), " +
+ "trim(leading 'aa' from 'abc'), trim(string_col from 'string') " +
+ "from functional.alltypes",
+ "SELECT trim('aa' FROM 'abc'), (trim('aa' FROM 'abc')), " +
+ "trim(LEADING 'aa' FROM 'abc'), trim(string_col FROM 'string') " +
+ "FROM functional.alltypes");
// LiteralExpr.
testToSql("select 10, (10), 20.0, (20.0), NULL, (NULL), 'abc', ('abc')",
"SELECT 10, (10), 20.0, (20.0), NULL, (NULL), 'abc', ('abc')");
diff --git
a/testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test
b/testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test
index 7a4690113..d506de4c1 100644
---
a/testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test
+++
b/testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test
@@ -329,6 +329,23 @@ INT,STRING,STRING
====
---- QUERY
set utf8_mode=true;
+select id, name, trim(leading substr(name, 1, 1) from name) from utf8_str_tiny;
+---- RESULTS: RAW_STRING
+1,'张三','三'
+2,'李四','四'
+3,'王五','五'
+4,'李小龙','小龙'
+5,'Alice','lice'
+6,'陈Bob','Bob'
+7,'Бopиc','opиc'
+8,'Jörg','örg'
+9,'ひなた','なた'
+10,'서연','연'
+---- TYPES
+INT,STRING,STRING
+====
+---- QUERY
+set utf8_mode=true;
select id, name, rtrim(name, substr(name, 2)) from utf8_str_tiny;
---- RESULTS: RAW_STRING
1,'张三','张'
@@ -346,6 +363,23 @@ INT,STRING,STRING
====
---- QUERY
set utf8_mode=true;
+select id, name, trim(trailing substr(name, 2) from name) from utf8_str_tiny;
+---- RESULTS: RAW_STRING
+1,'张三','张'
+2,'李四','李'
+3,'王五','王'
+4,'李小龙','李'
+5,'Alice','A'
+6,'陈Bob','陈'
+7,'Бopиc','Б'
+8,'Jörg','J'
+9,'ひなた','ひ'
+10,'서연','서'
+---- TYPES
+INT,STRING,STRING
+====
+---- QUERY
+set utf8_mode=true;
select id, name, ltrim(name, '张李王小A陈БJひ서') from utf8_str_tiny;
---- RESULTS: RAW_STRING
1,'张三','三'
@@ -363,6 +397,23 @@ INT,STRING,STRING
====
---- QUERY
set utf8_mode=true;
+select id, name, trim(leading '张李王小A陈БJひ서' from name) from utf8_str_tiny;
+---- RESULTS: RAW_STRING
+1,'张三','三'
+2,'李四','四'
+3,'王五','五'
+4,'李小龙','龙'
+5,'Alice','lice'
+6,'陈Bob','Bob'
+7,'Бopиc','opиc'
+8,'Jörg','örg'
+9,'ひなた','なた'
+10,'서연','연'
+---- TYPES
+INT,STRING,STRING
+====
+---- QUERY
+set utf8_mode=true;
select id, name, rtrim(name, '三四五小龙') from utf8_str_tiny;
---- RESULTS: RAW_STRING
1,'张三','张'
@@ -380,6 +431,23 @@ INT,STRING,STRING
====
---- QUERY
set utf8_mode=true;
+select id, name, trim(trailing '三四五小龙' from name) from utf8_str_tiny;
+---- RESULTS: RAW_STRING
+1,'张三','张'
+2,'李四','李'
+3,'王五','王'
+4,'李小龙','李'
+5,'Alice','Alice'
+6,'陈Bob','陈Bob'
+7,'Бopиc','Бopиc'
+8,'Jörg','Jörg'
+9,'ひなた','ひなた'
+10,'서연','서연'
+---- TYPES
+INT,STRING,STRING
+====
+---- QUERY
+set utf8_mode=true;
select id, name, btrim(name, '!?。,:;‘’“”≠≥≤∞ε∑∫√') from utf8_str_tiny;
---- RESULTS: RAW_STRING
1,'张三','张三'
@@ -395,3 +463,117 @@ select id, name, btrim(name, '!?。,:;‘’“”≠≥≤∞ε∑∫
---- TYPES
INT,STRING,STRING
====
+---- QUERY
+set utf8_mode=true;
+select id, name, trim('!?。,:;‘’“”≠≥≤∞ε∑∫√' from name) from utf8_str_tiny;
+---- RESULTS: RAW_STRING
+1,'张三','张三'
+2,'李四','李四'
+3,'王五','王五'
+4,'李小龙','李小龙'
+5,'Alice','Alice'
+6,'陈Bob','陈Bob'
+7,'Бopиc','Бopиc'
+8,'Jörg','Jörg'
+9,'ひなた','ひなた'
+10,'서연','서연'
+---- TYPES
+INT,STRING,STRING
+====
+---- QUERY
+set utf8_mode=false;
+select id, name, utf8_trim('!?。,:;‘’“”≠≥≤∞ε∑∫√' from name) from utf8_str_tiny;
+---- RESULTS: RAW_STRING
+1,'张三','张三'
+2,'李四','李四'
+3,'王五','王五'
+4,'李小龙','李小龙'
+5,'Alice','Alice'
+6,'陈Bob','陈Bob'
+7,'Бopиc','Бopиc'
+8,'Jörg','Jörg'
+9,'ひなた','ひなた'
+10,'서연','서연'
+---- TYPES
+INT,STRING,STRING
+====
+---- QUERY
+set utf8_mode=false;
+select id, name, utf8_trim('!?。,:;‘’“”≠≥≤∞ε∑∫√' from name) from utf8_str_tiny;
+---- RESULTS: RAW_STRING
+1,'张三','张三'
+2,'李四','李四'
+3,'王五','王五'
+4,'李小龙','李小龙'
+5,'Alice','Alice'
+6,'陈Bob','陈Bob'
+7,'Бopиc','Бopиc'
+8,'Jörg','Jörg'
+9,'ひなた','ひなた'
+10,'서연','서연'
+---- TYPES
+INT,STRING,STRING
+====
+---- QUERY
+set utf8_mode=true;
+select id, unnest(arr2) as item, trim(unnest(arr2) from "aaYYBBoo") from
complextypes_arrays order by id, item;
+---- RESULTS: RAW_STRING
+1,'five','aaYYBBoo'
+1,'four','aaYYBB'
+1,'one','aaYYBB'
+1,'three','aaYYBBoo'
+1,'two','aaYYBB'
+2,'five','aaYYBBoo'
+2,'one','aaYYBB'
+2,'three','aaYYBBoo'
+2,'two','aaYYBB'
+2,'NULL','aaYYBBoo'
+3,'ten','aaYYBBoo'
+4,'eight','aaYYBBoo'
+4,'nine','aaYYBBoo'
+4,'ten','aaYYBBoo'
+5,'eleven','aaYYBBoo'
+5,'ten','aaYYBBoo'
+5,'thirteen','aaYYBBoo'
+5,'twelve','aaYYBBoo'
+6,'str1','aaYYBBoo'
+6,'str2','aaYYBBoo'
+9,'str1','aaYYBBoo'
+9,'str2','aaYYBBoo'
+---- TYPES
+INT,STRING,STRING
+====
+---- QUERY
+set utf8_mode=true;
+select id, name, trim(first_value(name) over (order by id) from name) from
utf8_str_tiny;
+---- RESULTS: RAW_STRING
+1,'张三',''
+2,'李四','李四'
+3,'王五','王五'
+4,'李小龙','李小龙'
+5,'Alice','Alice'
+6,'陈Bob','陈Bob'
+7,'Бopиc','Бopиc'
+8,'Jörg','Jörg'
+9,'ひなた','ひなた'
+10,'서연','서연'
+---- TYPES
+INT,STRING,STRING
+====
+---- QUERY
+set utf8_mode=true;
+select id, name, trim(name from name) from utf8_str_tiny;
+---- RESULTS: RAW_STRING
+1,'张三',''
+2,'李四',''
+3,'王五',''
+4,'李小龙',''
+5,'Alice',''
+6,'陈Bob',''
+7,'Бopиc',''
+8,'Jörg',''
+9,'ひなた',''
+10,'서연',''
+---- TYPES
+INT,STRING,STRING
+====