Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21825 )

Change subject: IMPALA-889: Add trim() function matching ANSI SQL definition
......................................................................


Patch Set 6:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/21825/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21825/6//COMMIT_MSG@55
PS6, Line 55:  However, despite
            : that logic in the code, having "TRIM" just as a keyword still 
results in
            : syntax error (e.g. in queries like "create table t1(trim int)" ).
With the current state of the patch "create table t1(trim int)" would succeed, 
right? The sentence could be clearer about this, for example "having "TRIM" 
just as a keyword still result"->"having "TRIM" as a keyword would still result"


http://gerrit.cloudera.org:8080/#/c/21825/6/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/21825/6/fe/src/main/cup/sql-parser.cup@3870
PS6, Line 3870: "select sum(count(id))".
I don't get it, how does this affect sum(count(id))? All the new causes contain 
KW_FROM, which should distinguish it from sum/count


http://gerrit.cloudera.org:8080/#/c/21825/6/fe/src/main/jflex/sql-scanner.flex
File fe/src/main/jflex/sql-scanner.flex:

http://gerrit.cloudera.org:8080/#/c/21825/6/fe/src/main/jflex/sql-scanner.flex@421
PS6, Line 421: "leading", "trailing", "both"
I still think that it would be better to keep these as reserved and add them as 
keywords.

My problem with removing them from reservedWords is that this will lead to 
allowing statements like "create table t1(trailing int);". This is currently 
not allowed in Impala and Hive also doesn't allow it. If we allow it with the 
patch, that means that users can write queries that contain 
trealing/leading/both without backticks, and later disallowing this would break 
these queries. An example why we would disallow this in the future is moving to 
the calcite based planner, which probably use similar parsing rules as Hive. So 
I prefer to be as restrictive as possible (without breaking existing queries of 
course).


http://gerrit.cloudera.org:8080/#/c/21825/6/testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test
File 
testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test:

http://gerrit.cloudera.org:8080/#/c/21825/6/testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test@398
PS6, Line 398: ---- QUERY
I would prefer to reorganize the tests a bit:
For queries that are semantically equivalent to existing trim/rtrim/ltrim tests 
in this file, I would put the new test next to the original, e.g.
after
select id, name, rtrim(name, '?????') from utf8_str_tiny;
I would put
select id, name, trim(trailing name, '?????') from utf8_str_tiny;


For other tests I don't think that the utf8 handling is important, so for 
example
select id, name, trim(left('??????‘’“”????????',5) from name) from 
utf8_str_tiny;
doesn't seem to trim anything from the dataset. A simple test like
select id, name, trim(left('??????‘’“”????????',5) from "!a!";
seems more useful.

Also, tests where utf8 handling is not relevant could be moved to other files, 
for example expr-test.cc or exprs.test. Currently most trim tests are in 
expr-test.cc


http://gerrit.cloudera.org:8080/#/c/21825/6/testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test@400
PS6, Line 400: leading
I don't see any test for `both` - can you add one test with charset, and one 
without charset?


http://gerrit.cloudera.org:8080/#/c/21825/6/testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test@700
PS6, Line 700: set utf8_mode=false;
The current tests always use utf8_mode=true, while this test uses utf8_trim, so 
the non-utf8 code path is not tested.



--
To view, visit http://gerrit.cloudera.org:8080/21825
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c4fa6d0d8d0684c4b6d8dac8fd531d205e4f7b4
Gerrit-Change-Number: 21825
Gerrit-PatchSet: 6
Gerrit-Owner: Mihaly Szjatinya <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Mihaly Szjatinya <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>
Gerrit-Comment-Date: Mon, 14 Oct 2024 12:27:13 +0000
Gerrit-HasComments: Yes

Reply via email to