Mihaly Szjatinya has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/21825 )

Change subject: IMPALA-889: Add trim() function matching ANSI SQL definition
......................................................................

IMPALA-889: Add trim() function matching ANSI SQL definition

As agreed in JIRA discussions, the current PR extends existing TRIM
functionality with the support of SQL-standardized TRIM-FROM syntax:
TRIM ( [[ LEADING | TRAILING | BOTH ] characters FROM ] expression ).
Implemented based on the existing LTRIM / RTRIM / BTRIM family of
functions prepared earlier in IMPALA-6059 and extended for UTF-8 in
IMPALA-12718. Besides, partly based on abandoned PR
https://gerrit.cloudera.org/#/c/4474 and similar EXTRACT-FROM
functionality from https://github.com/apache/impala/commit/543fa73f3a846
f0e4527514c993cb0985912b06c.

Supported syntaxes:
Syntax #1 TRIM(<where> FROM <string>);
Syntax #2 TRIM(<charset> FROM <string>);
Syntax #3 TRIM(<where> <charset> FROM <string>);

"where": Case-insensitive trim direction. Valid options are "leading",
  "trailing", and "both". "leading" means trimming characters from the
  start; "trailing" means trimming characters from the end; "both" means
  trimming characters from both sides. For Syntax #2, since no "where"
  is specified, the option "both" is implied by default.

"charset": Case-sensitive characters to be removed. This argument is
  regarded as a character set going to be removed. The occurrence order
  of each character doesn't matter and duplicated instances of the same
  character will be ignored. NULL argument implies " " (standard space)
  by default. Empty argument ("" or '') makes TRIM return the string
  untouched. For Syntax #1, since no "charset" is specified, it trims
  " " (standard space) by default.

"string": Case-sensitive target string to trim. This argument can be
  NULL.

Design Notes:
1. No-BE. Since the existing LTRIM / RTRIM / BTRIM functions fully cover
all needed use-cases, no backend logic is required. This differs from
similar EXTRACT-FROM.

2. Syntax wrapper. TrimFromExpr class was introduced as a syntax wrapper
around FunctionCallExpr, which instantiates one of the regular LTRIM /
RTRIM / BTRIM functions. TrimFromExpr's role is to maintain the
integrity of the "phantom" TRIM-FROM built-in function.

3. No keywords. Following EXTRACT-FROM, no "TRIM" keyword was added to
the language. Although generally a keyword would allow easier and better
parsing, on the negative side it restricts token's usage in general
context. Arguably, this shouldn't be the case since built-in functions
are specifically removed from the reserved words list. However, despite
that logic in the code, having "TRIM" just as a keyword still results in
syntax error (e.g. in queries like "create table t1(trim int)" ).

Limitations:
1. Analysis vs Parsing. It seems a large part of the analysis in
analysis/*.Expr files can be performed at the parsing stage. In the case
of TRIM-FROM, this comes in part from the no-keyword design, as well as
from the lack of a straightforward verification mechanism for complex
cases. Although this analysis may be lacking.

2. Escaping "where". Since "leading", "trailing", and "both" are
inherited from earlier Impala versions as reserved words, these have to
be escaped in order to be used as identifiers.

3. Corner cases. In the current version, the following syntax results in
an analysis error:
select id, name, trim(name from "string") from table;
These work fine though:
select id, name, trim(table.name from "string") from table;
select id, name, trim(`leading` name from "string") from table;
select id, name, trim('charset' from name) from table;
Imho, it would make sense to get some general feedback on the design
before fixing this. Similarly, the behavior around NULL-values and empty
arguments needs to be thought through.

4. Design inconsistency. If the no-BE solution is accepted as more
preferable, effort should be made to see whether EXTRACT-FROM can be
refactored accordingly.

Change-Id: I3c4fa6d0d8d0684c4b6d8dac8fd531d205e4f7b4
---
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/org/apache/impala/analysis/TrimFromExpr.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M 
testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test
4 files changed, 359 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/21825/5
--
To view, visit http://gerrit.cloudera.org:8080/21825
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3c4fa6d0d8d0684c4b6d8dac8fd531d205e4f7b4
Gerrit-Change-Number: 21825
Gerrit-PatchSet: 5
Gerrit-Owner: Mihaly Szjatinya <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Mihaly Szjatinya <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>

Reply via email to