On Sat, Aug 24, 2019 at 2:03 PM Kohei KaiGai <kai...@heterodb.com> wrote: > > 2019年8月24日(土) 7:02 Thomas Munro <thomas.mu...@gmail.com>: > > > > On Fri, Aug 23, 2019 at 4:05 AM Kohei KaiGai <kai...@heterodb.com> wrote: > > > We can consider the table join ptable X t1 above is equivalent to: > > > (ptable_p0 + ptable_p1 + ptable_p2) X t1 > > > = (ptable_p0 X t1) + (ptable_p1 X t1) + (ptable_p2 X t1) > > > It returns an equivalent result, however, rows are already reduced by > > > HashJoin > > > in the individual leaf of Append, so CPU-cycles consumed by Append node > > > can > > > be cheaper. > > > > > > On the other hands, it has a downside because t1 must be read 3 times and > > > hash table also must be built 3 times. It increases the expected cost, > > > so planner > > > may not choose the asymmetric partition-wise join plan. > > > > What if you include the partition constraint as a filter on t1? So you get: > > > > ptable X t1 = > > (ptable_p0 X (σ hash(dist)%4=0 (t1))) + > > (ptable_p1 X (σ hash(dist)%4=1 (t1))) + > > (ptable_p2 X (σ hash(dist)%4=2 (t1))) + > > (ptable_p3 X (σ hash(dist)%4=3 (t1))) > > > > Pros: > > 1. The hash tables will not contain unnecessary junk. > > 2. You'll get the right answer if t1 is on the outer side of an outer join. > > 3. If this runs underneath a Parallel Append and t1 is big enough > > then workers will hopefully cooperate and do a synchronised scan of > > t1. > > 4. The filter might enable a selective and efficient plan like an index > > scan. > > > > Cons: > > 1. The filter might not enable a selective and efficient plan, and > > therefore cause extra work. > > > > (It's a little weird in this example because don't usually see hash > > functions in WHERE clauses, but that could just as easily be dist > > BETWEEN 1 AND 99 or any other partition constraint.) > > > It requires the join-key must include the partition key and also must be > equality-join, doesn't it? > If ptable and t1 are joined using ptable.dist = t1.foo, we can distribute > t1 for each leaf table with "WHERE hash(foo)%4 = xxx" according to > the partition bounds, indeed. > > In case when some of partition leafs are pruned, it is exactly beneficial > because relevant rows to be referenced by the pruned child relations > are waste of memory. > > On the other hands, it eventually consumes almost equivalent amount > of memory to load the inner relations, if no leafs are pruned, and if we > could extend the Hash-node to share the hash-table with sibling join-nodess. > > > > One idea I have is, sibling HashJoin shares a hash table that was built > > > once > > > by any of the sibling Hash plan. Right now, it is not implemented yet. > > > > Yeah, I've thought a little bit about that in the context of Parallel > > Repartition. I'm interested in combining intra-node partitioning > > (where a single plan node repartitions data among workers on the fly) > > with inter-node partitioning (like PWJ, where partitions are handled > > by different parts of the plan, considered at planning time); you > > finish up needing to have nodes in the plan that 'receive' tuples for > > each partition, to match up with the PWJ plan structure. That's not > > entirely unlike CTE references, and not entirely unlike your idea of > > somehow sharing the same hash table. I ran into a number of problems > > while thinking about that, which I should write about in another > > thread. > > > Hmm. Do you intend the inner-path may have different behavior according > to the partition bounds definition where the outer-path to be joined? > Let me investigate its pros & cons. > > The reasons why I think the idea of sharing the same hash table is reasonable > in this scenario are: > 1. We can easily extend the idea for parallel optimization. A hash table on > DSM > segment, once built, can be shared by all the siblings in all the > parallel workers. > 2. We can save the memory consumption regardless of the join-keys and > partition-keys, even if these are not involved in the query. > > On the other hands, below are the downside. Potentially, combined use of > your idea may help these cases: > 3. Distributed inner-relation cannot be outer side of XXX OUTER JOIN. > 4. Hash table contains rows to be referenced by only pruned partition leafs. >
+ many, for the sharable hash of the inner table of the join. IMHO, this could be the most interesting and captivating thing about this feature. But might be a complicated piece, is that still on the plan? Regards, Amul