Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21142 )

Change subject: IMPALA-12737: List columns in profile and query history
......................................................................


Patch Set 27:

(13 comments)

I have not thoroughly review this, but I want to give my initial review.

Please also address clang-tidy failure in 
https://jenkins.impala.io/job/clang-tidy-ub2004/4252/console

http://gerrit.cloudera.org:8080/#/c/21142/27//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21142/27//COMMIT_MSG@9
PS27, Line 9: Adds "Select Columns", "Where Columns", "Join Columns", "Aggregate
            : Columns", and "OrderBy Columns"
I think this patch has grown beyond just adding new columns in WM table.
Please consider containing the refactoring changes (WM versioning, 
initialization, result fetching, etc) into its own patch.


http://gerrit.cloudera.org:8080/#/c/21142/27//COMMIT_MSG@19
PS27, Line 19: The "aggregate_columns" field contains a de-duped list of all 
columns
             : in both the order by and having clauses.
Please mention Tests added/done for this patch.


http://gerrit.cloudera.org:8080/#/c/21142/27/be/src/kudu/util/version_util.h
File be/src/kudu/util/version_util.h:

http://gerrit.cloudera.org:8080/#/c/21142/27/be/src/kudu/util/version_util.h@42
PS27, Line 42:   // Compare two Version objects. Versions that contain an extra 
component sort before
             :   // (less than) versions that do not contain an extra 
component. No special handling is
             :   // done of the extra component.  Thus '-SNAPSHOT' sorts as 
greater than '-RELEASE'.
             :   // Example sort order (from least to greatest):
             :   //     1.0.0-RELEASE, 1.0.0-SNAPSHOT, 1.0.0, 1.0.1-SNAPSHOT.
             :   bool operator<(const Version& other) const;
             :   bool operator<=(const Version& other) const;
             :   bool operator>(const Version& other) const;
In general, I'm worried about making changes under be/src/kudu/* since we 
usually copy and replace codes under this dir straight from Kudu repository 
(see IMPALA-9335 and IMPALA-10931). We risk losing this changes if we do 
another Kudu source code rebase in the future.

Perhaps we can contribute this code to Upstream Kudu first?


http://gerrit.cloudera.org:8080/#/c/21142/27/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/21142/27/be/src/service/impala-server.cc@760
PS27, Line 760: MonotonicNanos
Use MonotonicMillis() instead?


http://gerrit.cloudera.org:8080/#/c/21142/27/be/src/service/impala-server.cc@3235
PS27, Line 3235: VLOG(2) << "Internal server ready";
Seems redundant with LOG(INFO) below.


http://gerrit.cloudera.org:8080/#/c/21142/27/be/src/service/internal-server.h
File be/src/service/internal-server.h:

http://gerrit.cloudera.org:8080/#/c/21142/27/be/src/service/internal-server.h@149
PS27, Line 149: Intended for use as a convenience method when query results are 
small
Will be great if we can LOG(WARNING) if the query results turns out not small.


http://gerrit.cloudera.org:8080/#/c/21142/27/be/src/service/internal-server.cc
File be/src/service/internal-server.cc:

http://gerrit.cloudera.org:8080/#/c/21142/27/be/src/service/internal-server.cc@305
PS27, Line 305: i=0; i<results_metadata->columns.size()
nit: add space between '=' and '<' operator,


http://gerrit.cloudera.org:8080/#/c/21142/26/common/thrift/Frontend.thrift
File common/thrift/Frontend.thrift:

http://gerrit.cloudera.org:8080/#/c/21142/26/common/thrift/Frontend.thrift@705
PS26, Line 705:   // Columns referenced in a select list.
              :   21: optional list<string> select_columns
              :
              :   // Columns referenced in a where clause.
              :   22: optional list<string> where_columns
              :
              :   // Columns referenced in a join clause.
              :   23: optional list<string> join_columns
              :
              :   // Columns referenced in an aggregation.
              :   24: optional list<string> aggregate_columns
              :
              :   // Columns referenced in an order by clause.
              :   25: optional list<string> orderby_columns
> Perhaps we don't want that in the profile. I think in the query log table t
Ack


http://gerrit.cloudera.org:8080/#/c/21142/26/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/21142/26/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@4488
PS26, Line 4488: mplified ? t
> Mostly avoiding allocating another Set. But I do agree it's an unusual patt
Ack


http://gerrit.cloudera.org:8080/#/c/21142/26/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/21142/26/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@383
PS26, Line 383: selectList_.getItems().stream().filter(elem -> !elem.isStar())
              :           .forEach(item -> 
item.getExpr().collect(SlotRef.class, slotRefs));
              :       analyzer_.addSelectColumns(Stream.concat(
> Done
Done


http://gerrit.cloudera.org:8080/#/c/21142/26/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@387
PS26, Line 387:
> Done
Done


http://gerrit.cloudera.org:8080/#/c/21142/27/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/21142/27/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@380
PS27, Line 380: // Register all columns referenced in this statement, starting 
with select clause.
              :       // Joins will be added during planning.
Can you add comment explaining about how column alias is treated? Is the alias 
will be registered, or the real column name?
An example of this is SELECT over UNION of different column names.


http://gerrit.cloudera.org:8080/#/c/21142/26/tests/util/workload_management.py
File tests/util/workload_management.py:

http://gerrit.cloudera.org:8080/#/c/21142/26/tests/util/workload_management.py@30
PS26, Line 30:
             : def round_to_3(val):
             :   # The differences betwe
> Done
Done



--
To view, visit http://gerrit.cloudera.org:8080/21142
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I78f3670b067c0c192ee8a212fba95466fbcb51d7
Gerrit-Change-Number: 21142
Gerrit-PatchSet: 27
Gerrit-Owner: Michael Smith <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Jason Fehr <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Comment-Date: Tue, 30 Jul 2024 04:11:36 +0000
Gerrit-HasComments: Yes

Reply via email to