Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21665 )
Change subject: IMPALA-12865: enable_reload_events breaks enable_skipping_older_events by pushing lastRefreshEventId too high ...................................................................... Patch Set 6: (3 comments) http://gerrit.cloudera.org:8080/#/c/21665/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/21665/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7176 PS3, Line 7176: l.getFullName()); > Yeah, you are right. We need to prioritize correctness over performance. Instead of setting 'lastRefreshEventId_' back to -1 in this case, I think we can keep the original value if it's not -1. In HdfsTable#setLastRefreshEventId(), we only update 'lastRefreshEventId_' if the given 'eventId' is larger: https://github.com/apache/impala/blob/7369ebb8ba02edfedcef071029b7bcd62157f452/fe/src/main/java/org/apache/impala/catalog/Table.java#L1115-L1117 However, in HdfsPartition$Builder#setLastRefreshEventId(), we are missing the same check and we can add it: https://github.com/apache/impala/blob/7369ebb8ba02edfedcef071029b7bcd62157f452/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java#L1382 http://gerrit.cloudera.org:8080/#/c/21665/6/tests/custom_cluster/test_events_custom_configs.py File tests/custom_cluster/test_events_custom_configs.py: http://gerrit.cloudera.org:8080/#/c/21665/6/tests/custom_cluster/test_events_custom_configs.py@1201 PS6, Line 1201: 2000 Can we set this to 4000? In my local env, 2000 is not enough to fail the test before the fix. I think run_stmt_in_hive() is slow. http://gerrit.cloudera.org:8080/#/c/21665/6/tests/custom_cluster/test_events_custom_configs.py@1212 PS6, Line 1212: self.client.execute_async("refresh {} partition(year=2024)".format(tbl)) Can we get the handle and make sure the statement actually finishes? E.g. handle = self.client.execute_async(...) and after run_stmt_in_hive(), add self.wait_for_state(handle, self.client.QUERY_STATES['FINISHED'], timeout=4) assert self.client.get_state(handle) == self.client.QUERY_STATES['FINISHED'] -- To view, visit http://gerrit.cloudera.org:8080/21665 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I90039da77ec561c5aede44456f88c6650582815b Gerrit-Change-Number: 21665 Gerrit-PatchSet: 6 Gerrit-Owner: Sai Hemanth Gantasala <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]> Gerrit-Comment-Date: Fri, 08 Nov 2024 08:53:45 +0000 Gerrit-HasComments: Yes
