Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21729 )

Change subject: IMPALA-13185: Include runtime filter source in key
......................................................................


Patch Set 9:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/21729/9/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/21729/9/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@386
PS9, Line 386:         // Required property; RuntimeFilterId is irrelevant to 
tuple cache results.
             :         tFilter.setFilter_id(0);
             :         // The target plan is already serialized as part of the 
calling context. targets_
             :         // may reference multiple target nodes; the others are 
irrelevent for tuple
             :         // caching.
             :         tFilter.setTargets(new ArrayList<>());
             :         // Target PlanNodeId is global and not useful for tuple 
caching.
             :         tFilter.setPlanid_to_target_ndx(new HashMap<>());
             :         // The source plan is handled in 
PlanNode.computeTupleCacheInfo.
For this case, we no longer include the TRuntimeFilterTargetDesc. I think that 
we will need to include TRuntimeFilterTargetDesc to get the target_expr. See 
the comment in TestTupleCache.java about a test case that would need this.


http://gerrit.cloudera.org:8080/#/c/21729/9/fe/src/main/java/org/apache/impala/planner/TupleCacheInfo.java
File fe/src/main/java/org/apache/impala/planner/TupleCacheInfo.java:

http://gerrit.cloudera.org:8080/#/c/21729/9/fe/src/main/java/org/apache/impala/planner/TupleCacheInfo.java@228
PS9, Line 228:         for (TFileSplitGeneratorSpec splitSpec: 
orig.split_specs) {
             :           spec.addToSplit_specs(splitSpec);
             :         }
TFileSplitGeneratorSpec also has partition_id, so we can zero that out here as 
well.


http://gerrit.cloudera.org:8080/#/c/21729/5/fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java
File fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java:

http://gerrit.cloudera.org:8080/#/c/21729/5/fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java@69
PS5, Line 69:   public void testRuntimeFilterCacheKeys() {
> That seems like I should add straight_join to every example?
I think using the aliases to annotate which is the probe side vs build side is 
useful to anyone reading this.

Getting deterministic join order is doable by using tables of different sizes. 
It's a bit subtle. I prefer straight_join, because it makes it clear that the 
test cases need a specific join order.

I think we should take the plunge and use straight_join here.


http://gerrit.cloudera.org:8080/#/c/21729/5/fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java@81
PS5, Line 81:     verifyDifferentCacheKeys(basicJoin, "select a.id from 
functional.alltypes a, " +
            :         "functional.alltypes b where a.id = b.id");
> Done, I think.
What I mean here is a predicate that gets assigned to the build side's scan 
node, so it would need to be a condition purely on that table. e.g. where a.id 
= b.id and b.id < 100 should have no overlap with the same query with a.id = 
b.id.


http://gerrit.cloudera.org:8080/#/c/21729/5/fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java@139
PS5, Line 139:   /**
> Added a slightly simplified version that eliminates a hash join.
About whether tests belong in this function: It's vague, but I would keep this 
function mainly focused on cases where some difference gets ignored to let 
things match.

For the test case I mentioned, the thing we're testing is that slot id 
translation is happening properly for the source expression and target 
expression for the runtime filter. Any test case that covers that is fine.

For the statement I listed, the probe side is not eligible, but it has two 
cache locations on the build side. There is a runtime filter from build1 to 
build2 (or vice versa, I forget). Adding conditions on the probe side shifts 
the slot ids for the two cache locations on the build side.


http://gerrit.cloudera.org:8080/#/c/21729/5/fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java@371
PS5, Line 371:    * fragment) and return it for
> I think I've added that in test_tuple_cache.py.
Ack


http://gerrit.cloudera.org:8080/#/c/21729/9/fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java
File fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java:

http://gerrit.cloudera.org:8080/#/c/21729/9/fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java@78
PS9, Line 78:     verifyOverlappingCacheKeys(basicJoin, "select a.id from 
functional.alltypes a, " +
            :         "functional.alltypestiny b where a.id + 1 = b.id");
The source expression / target expression piece of runtime filters is 
intricate. Here is a case that currently fails:

    verifyOverlappingCacheKeys(
        "select a.id from functional.alltypes a, " +
        "functional.alltypestiny b where a.id + 1 = b.id",
        "select a.id from functional.alltypes a, " +
        "functional.alltypestiny b where a.id + 2 = b.id");

This fails because they keys are all identical (thanks for adding that check). 
The difference between "a.id + 1" and "a.id + 2" is probably in 
TRuntimeFilterTargetDesc's target_expr.


http://gerrit.cloudera.org:8080/#/c/21729/9/tests/custom_cluster/test_tuple_cache.py
File tests/custom_cluster/test_tuple_cache.py:

http://gerrit.cloudera.org:8080/#/c/21729/9/tests/custom_cluster/test_tuple_cache.py@237
PS9, Line 237:     query = "select straight_join a.id from functional.alltypes 
a, functional.alltypes" \
             :         " b, {0} c where a.id = b.id and a.id = c.age order by 
a.id".format(fq_table)
Nit: Can we add a comment here describing the runtime filters that are in this 
query? i.e. The scan of A receives runtime filters from both of the hash joins, 
so it depends on the contents of B and C. The scan of B receives a runtime 
fitler from the hash join with table C, so it depends on the contents of C.



--
To view, visit http://gerrit.cloudera.org:8080/21729
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0077964be5acdb588d76251a6a39e57a0f42bb5a
Gerrit-Change-Number: 21729
Gerrit-PatchSet: 9
Gerrit-Owner: Michael Smith <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Steve Carlin <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Tue, 01 Oct 2024 21:15:21 +0000
Gerrit-HasComments: Yes

Reply via email to