Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/21729 )
Change subject: IMPALA-13185: Include runtime filter source in key ...................................................................... Patch Set 9: (8 comments) http://gerrit.cloudera.org:8080/#/c/21729/9/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/21729/9/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@386 PS9, Line 386: // Required property; RuntimeFilterId is irrelevant to tuple cache results. : tFilter.setFilter_id(0); : // The target plan is already serialized as part of the calling context. targets_ : // may reference multiple target nodes; the others are irrelevent for tuple : // caching. : tFilter.setTargets(new ArrayList<>()); : // Target PlanNodeId is global and not useful for tuple caching. : tFilter.setPlanid_to_target_ndx(new HashMap<>()); : // The source plan is handled in PlanNode.computeTupleCacheInfo. For this case, we no longer include the TRuntimeFilterTargetDesc. I think that we will need to include TRuntimeFilterTargetDesc to get the target_expr. See the comment in TestTupleCache.java about a test case that would need this. http://gerrit.cloudera.org:8080/#/c/21729/9/fe/src/main/java/org/apache/impala/planner/TupleCacheInfo.java File fe/src/main/java/org/apache/impala/planner/TupleCacheInfo.java: http://gerrit.cloudera.org:8080/#/c/21729/9/fe/src/main/java/org/apache/impala/planner/TupleCacheInfo.java@228 PS9, Line 228: for (TFileSplitGeneratorSpec splitSpec: orig.split_specs) { : spec.addToSplit_specs(splitSpec); : } TFileSplitGeneratorSpec also has partition_id, so we can zero that out here as well. http://gerrit.cloudera.org:8080/#/c/21729/5/fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java File fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java: http://gerrit.cloudera.org:8080/#/c/21729/5/fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java@69 PS5, Line 69: public void testRuntimeFilterCacheKeys() { > That seems like I should add straight_join to every example? I think using the aliases to annotate which is the probe side vs build side is useful to anyone reading this. Getting deterministic join order is doable by using tables of different sizes. It's a bit subtle. I prefer straight_join, because it makes it clear that the test cases need a specific join order. I think we should take the plunge and use straight_join here. http://gerrit.cloudera.org:8080/#/c/21729/5/fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java@81 PS5, Line 81: verifyDifferentCacheKeys(basicJoin, "select a.id from functional.alltypes a, " + : "functional.alltypes b where a.id = b.id"); > Done, I think. What I mean here is a predicate that gets assigned to the build side's scan node, so it would need to be a condition purely on that table. e.g. where a.id = b.id and b.id < 100 should have no overlap with the same query with a.id = b.id. http://gerrit.cloudera.org:8080/#/c/21729/5/fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java@139 PS5, Line 139: /** > Added a slightly simplified version that eliminates a hash join. About whether tests belong in this function: It's vague, but I would keep this function mainly focused on cases where some difference gets ignored to let things match. For the test case I mentioned, the thing we're testing is that slot id translation is happening properly for the source expression and target expression for the runtime filter. Any test case that covers that is fine. For the statement I listed, the probe side is not eligible, but it has two cache locations on the build side. There is a runtime filter from build1 to build2 (or vice versa, I forget). Adding conditions on the probe side shifts the slot ids for the two cache locations on the build side. http://gerrit.cloudera.org:8080/#/c/21729/5/fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java@371 PS5, Line 371: * fragment) and return it for > I think I've added that in test_tuple_cache.py. Ack http://gerrit.cloudera.org:8080/#/c/21729/9/fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java File fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java: http://gerrit.cloudera.org:8080/#/c/21729/9/fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java@78 PS9, Line 78: verifyOverlappingCacheKeys(basicJoin, "select a.id from functional.alltypes a, " + : "functional.alltypestiny b where a.id + 1 = b.id"); The source expression / target expression piece of runtime filters is intricate. Here is a case that currently fails: verifyOverlappingCacheKeys( "select a.id from functional.alltypes a, " + "functional.alltypestiny b where a.id + 1 = b.id", "select a.id from functional.alltypes a, " + "functional.alltypestiny b where a.id + 2 = b.id"); This fails because they keys are all identical (thanks for adding that check). The difference between "a.id + 1" and "a.id + 2" is probably in TRuntimeFilterTargetDesc's target_expr. http://gerrit.cloudera.org:8080/#/c/21729/9/tests/custom_cluster/test_tuple_cache.py File tests/custom_cluster/test_tuple_cache.py: http://gerrit.cloudera.org:8080/#/c/21729/9/tests/custom_cluster/test_tuple_cache.py@237 PS9, Line 237: query = "select straight_join a.id from functional.alltypes a, functional.alltypes" \ : " b, {0} c where a.id = b.id and a.id = c.age order by a.id".format(fq_table) Nit: Can we add a comment here describing the runtime filters that are in this query? i.e. The scan of A receives runtime filters from both of the hash joins, so it depends on the contents of B and C. The scan of B receives a runtime fitler from the hash join with table C, so it depends on the contents of C. -- To view, visit http://gerrit.cloudera.org:8080/21729 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0077964be5acdb588d76251a6a39e57a0f42bb5a Gerrit-Change-Number: 21729 Gerrit-PatchSet: 9 Gerrit-Owner: Michael Smith <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Steve Carlin <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Tue, 01 Oct 2024 21:15:21 +0000 Gerrit-HasComments: Yes
