Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23154 )

Change subject: IMPALA-9846: Enable Aggregated Runtime Profile by Default
......................................................................


Patch Set 13:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/23154/13//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/23154/13//COMMIT_MSG@16
PS13, Line 16: this change,
I see 3 kind of changes in the patch, and it may be useful to split them to 
separate commits:

1. changing text/json output by adding total + the info that the profile is 
aggregated
2. changing tests to expect the new profile format
3. changing the default

2 is purely an internal test change, while 1 and 3 are actual product changes 
(maybe 1 is not, as aggregated profiles were considered experimental)

The tests could switch to using aggregated profiles without changing the 
default for the flag - there are a few flags where the test cluster overrides 
the default value. (I am not sure whether splitting 2 and 3 is useful)

A reason why splitting 1 and 2+3 would be useful is simplifying backporting - I 
can imagine 1. to be backported if it turns out that someone already uses 
aggregated profiles to avoid producing different outputs, while backporting 2 
would be a much harder due to its size.

Another thing worth splitting is that lot of test changes seem to be only 
needed because the test writer didn't use aggregate(SUM, ...), while it would 
have made the tests less fragile and allow running the same test with 
aggregated and legacy profiles. These rewrites could be merged before the rest 
of the patch.


http://gerrit.cloudera.org:8080/#/c/23154/13//COMMIT_MSG@20
PS13, Line 20: The following tests are to be modified to support the usage of
             : aggregated runtime profile -
It would be useful the list of common reasons why the change was needed.


http://gerrit.cloudera.org:8080/#/c/23154/13/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

http://gerrit.cloudera.org:8080/#/c/23154/13/be/src/util/runtime-profile.cc@2320
PS13, Line 2320: total
This seems like actually changing the text profile format (and also json in 
another function). Can you mention the format changes in commit message?


http://gerrit.cloudera.org:8080/#/c/23154/13/testdata/workloads/functional-query/queries/QueryTest/binary-type.test
File testdata/workloads/functional-query/queries/QueryTest/binary-type.test:

http://gerrit.cloudera.org:8080/#/c/23154/13/testdata/workloads/functional-query/queries/QueryTest/binary-type.test@167
PS13, Line 167: !row_regex:.*TotalKuduScanRoundTrips: total=1.*
This changes the semantics of the regexp, it is still correct though because 
this specific table only has a single file, so there will be always just one 
instance scanning it.

I think that all 3 should be rewritten to expect 0 as the sum of 
TotalKuduScanRoundTrips (note that I wrote this test, and now I see that my 
solution for checking this counter was not that good)


http://gerrit.cloudera.org:8080/#/c/23154/13/testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test
File 
testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test:

http://gerrit.cloudera.org:8080/#/c/23154/13/testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test@544
PS13, Line 544: row_regex: Partition: year=2010 
/day=__HIVE_DEFAULT_PARTITION__\nNumModifiedRows: 1000
I see several tests rewritten to use row_regex - what is the reason for this? 
Can you mention it in the commit message?


http://gerrit.cloudera.org:8080/#/c/23154/13/tests/common/test_result_verifier.py
File tests/common/test_result_verifier.py:

http://gerrit.cloudera.org:8080/#/c/23154/13/tests/common/test_result_verifier.py@601
PS13, Line 601: # Special syntax for basic aggregation over fields in the 
runtime profile.
This is the same as line 597


http://gerrit.cloudera.org:8080/#/c/23154/13/tests/common/test_result_verifier.py@624
PS13, Line 624: class CounterType(Enum):
              :     AGGREGATED = 1 # Aggregated / Averaged
              :     UNAGGREGATED = 2
I see several UNAGGREGATED in test files. What does this mean? I assumed that 
all tests use aggregated profiles from this patch


http://gerrit.cloudera.org:8080/#/c/23154/13/tests/common/test_result_verifier.py@647
PS13, Line 647:     if re.search(r"[^\\]\([^\?]", pattern):
              :       assert False, ("Capturing groups are not allowed(use 
non-capturing groups)"
              :                       + "\nInvalid row regex specification: %s" 
% row_string)
Is there a reason for not allowing this besides being non useful?


http://gerrit.cloudera.org:8080/#/c/23154/13/tests/common/test_result_verifier.py@650
PS13, Line 650:     # Remove .* and .*? from beginning and ending
              :     pattern = re.sub(r"^\.\*\??|\.\*\??$|^\.\+\??|\.\+\??$", 
"", pattern)
This may make the change even bigger, but it would make more sense to me to 
throw an error in this case, which would warn the contributor to change the 
test. Silently ignoring it can lead to people continuing writing such regexps.


http://gerrit.cloudera.org:8080/#/c/23154/13/tests/common/test_result_verifier.py@664
PS13, Line 664: aggregaton
typo


http://gerrit.cloudera.org:8080/#/c/23154/13/tests/common/test_result_verifier.py@698
PS13, Line 698:   print(expected)
this looks like a debugging print()


http://gerrit.cloudera.org:8080/#/c/23154/13/tests/common/test_result_verifier.py@700
PS13, Line 700:   if "Profile Type: AGGREGATED" in actual:
              :     agg_profile = True
Does to code actually support non-aggregated profiles? Does it need to support 
it?


http://gerrit.cloudera.org:8080/#/c/23154/13/tests/common/test_result_verifier.py@723
PS13, Line 723:   print(united_regex)
this looks like a debugging print()



--
To view, visit http://gerrit.cloudera.org:8080/23154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If41d6322361fba82c946efd614cc7d28cb1c36e8
Gerrit-Change-Number: 23154
Gerrit-PatchSet: 13
Gerrit-Owner: Surya Hebbar <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Comment-Date: Sat, 11 Oct 2025 16:43:44 +0000
Gerrit-HasComments: Yes

Reply via email to