Jason Fehr has posted comments on this change. ( http://gerrit.cloudera.org:8080/20770 )
Change subject: IMPALA-12426: Query History Table ...................................................................... Patch Set 37: (13 comments) http://gerrit.cloudera.org:8080/#/c/20770/34/be/src/service/workload-management.h File be/src/service/workload-management.h: http://gerrit.cloudera.org:8080/#/c/20770/34/be/src/service/workload-management.h@57 PS34, Line 57: /// : /// Parameters: : /// `rec` - `QueryStateExpanded` object, an > Mention the default option chosen here. I don't like having the actual query options in the comments because there is a tendency for those sort of comments to get out of sync with the code. I did add a comment explaining where to go to find the actual query opts. http://gerrit.cloudera.org:8080/#/c/20770/34/be/src/service/workload-management.cc File be/src/service/workload-management.cc: http://gerrit.cloudera.org:8080/#/c/20770/34/be/src/service/workload-management.cc@197 PS34, Line 197: > Can drop std:: here and below since already using namespace std? Copy-pasta error. Done http://gerrit.cloudera.org:8080/#/c/20770/34/be/src/service/workload-management.cc@211 PS34, Line 211: TQueryOptions opts; : > This can be removed, and INSERT_QUERY_OPTS simply do the the right hand sid Done http://gerrit.cloudera.org:8080/#/c/20770/34/be/src/service/workload-management.cc@443 PS34, Line 443: > I think quote is unnecessary for logging here and below. I like having the quotes only because it makes it easier for logging solutions to parse out table and query_id as fields from this log message. http://gerrit.cloudera.org:8080/#/c/20770/36/be/src/service/workload-management.cc File be/src/service/workload-management.cc: http://gerrit.cloudera.org:8080/#/c/20770/36/be/src/service/workload-management.cc@195 PS36, Line 195: }); > gflags can't have been set from the CLI during static initialization. Which Good catch! I was not aware that static initialization happened before the flags were set. I have fixed this issue and added a test to ensure the query_log_table_db and query_log_table_name flags actually change the completed queries db/table. http://gerrit.cloudera.org:8080/#/c/20770/34/be/src/util/sql-util.cc File be/src/util/sql-util.cc: http://gerrit.cloudera.org:8080/#/c/20770/34/be/src/util/sql-util.cc@29 PS34, Line 29: ace i > Can drop std:: here and below since already using namespace std? Done http://gerrit.cloudera.org:8080/#/c/20770/34/be/src/util/sql-util.cc@36 PS34, Line 36: }; > line has trailing whitespace Done http://gerrit.cloudera.org:8080/#/c/20770/35/be/src/util/sql-util.cc File be/src/util/sql-util.cc: http://gerrit.cloudera.org:8080/#/c/20770/35/be/src/util/sql-util.cc@57 PS35, Line 57: ret << c; > A switch statement is a lot faster than hash map lookup. It wasn't that ver Interesting, I would have expected similar performance since a hash map lookup complexity is O(1). Either way, I switched back to the select case since the hash map was only used as part of a different approach I initially tried. I also switched back to the function accepting a const string& instead of a begin and end iterator. It makes both the calling code and this code a lot cleaner. http://gerrit.cloudera.org:8080/#/c/20770/36/common/thrift/metrics.json File common/thrift/metrics.json: http://gerrit.cloudera.org:8080/#/c/20770/36/common/thrift/metrics.json@3765 PS36, Line 3765: "key": "impala-server.completed-queries.written" > Use dashes in query keys, not underscores. Done. I initially found instances of both dashes and underscores in the metric names, thus I just picked an approach and ran with it. http://gerrit.cloudera.org:8080/#/c/20770/31/tests/custom_cluster/test_query_log.py File tests/custom_cluster/test_query_log.py: http://gerrit.cloudera.org:8080/#/c/20770/31/tests/custom_cluster/test_query_log.py@68 PS31, Line 68: > nit: most of the loops of profile_lines could instead use I was able to replace three of the places where the code looped through the query lines with this pattern. http://gerrit.cloudera.org:8080/#/c/20770/33/tests/custom_cluster/test_query_log.py File tests/custom_cluster/test_query_log.py: http://gerrit.cloudera.org:8080/#/c/20770/33/tests/custom_cluster/test_query_log.py@411 PS33, Line 411: > I've seem some cases in the live query test suite where this is hundreds (n I hit the same situation and made this change. http://gerrit.cloudera.org:8080/#/c/20770/34/tests/custom_cluster/test_query_log.py File tests/custom_cluster/test_query_log.py: http://gerrit.cloudera.org:8080/#/c/20770/34/tests/custom_cluster/test_query_log.py@827 PS34, Line 827: for i in range(query_count): > Can this be simplifield by just looping 30 times? The purpose of this test is to assert that the 15 second flush interval is correctly implemented. I was able to simplify this test quite a bit though based on your suggestion. http://gerrit.cloudera.org:8080/#/c/20770/34/tests/custom_cluster/test_query_log.py@836 PS34, Line 836: client.close() > Isn't it better if total_time is as close to 10s? So maybe just asserting < Part of this test is to assert the queries are not written to the completed queries table too soon. Thus, I like having the minimum as part of the assert. That being said, I was able to simplify this test quite a bit based on your suggestions, and this assertion is now gone. -- To view, visit http://gerrit.cloudera.org:8080/20770 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2d2da9d450fba4e789400cfa62927fc25d34f844 Gerrit-Change-Number: 20770 Gerrit-PatchSet: 37 Gerrit-Owner: Jason Fehr <[email protected]> Gerrit-Reviewer: Andrew Sherman <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Jason Fehr <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Comment-Date: Fri, 08 Mar 2024 21:09:20 +0000 Gerrit-HasComments: Yes
