Yeah, actually, I have tested the multiple join on TPCDS dataset (1g) on UCLA‘s AsterixDB cluster (16 nodes). Two different join orders as following:
SELECT COUNT(*) FROM (SELECT * FROM catalog_sales cs1 JOIN catalog_returns cr1 ON (cs1.cs_order_number = cr1.cr_order_number AND cs1.cs_item_sk = cr1.cr_item_sk)) JOIN inventory i1 ON i1.inv_item_sk = cs1.cs_item_sk; SELECT COUNT(*) FROM (SELECT * FROM catalog_sales cs1 JOIN inventory i1 ON cs1.cs_item_sk = i1.inv_item_sk) JOIN catalog_returns cr1 ON (cs1.cs_order_number = cr1.cr_order_number AND cs1.cs_item_sk = cr1.cr_item_sk); *Good join order just needs 2.613 sec while the bad one needs 35.123 sec. * *I am also building bloom filter and sampler in Hyracks level to collect stats to try optimize the query on hyracks.* *Bests,* *Mingda* On Mon, Oct 31, 2016 at 1:22 PM, Ildar Absalyamov < [email protected]> wrote: > As Yingyi pointed out we don't reorder joins because the framework for > stats and cardinalities is not there yet. > However what we can do in an meantime is to provide an interface for the > statistical information, needed for join reordering, independent of the way > the stats were collected (either sampling-based or LSM-based) and work out > details of the cost model. > > 2016-10-31 11:30 GMT-07:00 mingda li <[email protected]>: > > > Hi Yingyi, > > > > I see. Thanks for your reply:-) > > > > Bests, > > Mingda > > > > > > On Mon, Oct 31, 2016 at 11:23 AM, Yingyi Bu <[email protected]> wrote: > > > > > Mingda, > > > > > > I'm not sure how much re-ordering can be done at the Hyracks > level, > > > i.e., the runtime level. > > > In the optimizer (the asterixdb/algebricks level), we don't have > > > re-ordering for joins, because: > > > --- the cost model has not been added yet. I'm not sure about the > > > timeline for this. @Ildar? > > > --- respecting user-specified join orders is important for certain > > > cases, for example, to get stable/predictable performance (zero > surprise) > > > for applications. > > > > > > In the runtime, we have a role-reversal optimization in hybrid > hash > > > join, which is a safe optimization that is not based on estimations. > You > > > can look at OptimizedHybridHashJoin. > > > > > > Best, > > > Yingyi > > > > > > > > > On Mon, Oct 31, 2016 at 11:16 AM, mingda li <[email protected]> > > > wrote: > > > > > > > Dear all, > > > > > > > > Hi, I am working on multiple join on Hyracks level. I am not sure if > I > > do > > > > the multiple join on AsterixDB, whether it will optimize the query by > > > > changing the join order or just execute according to how we write the > > > > query. I think this may not be done in Algebricks level based on rule > > but > > > > not sure. > > > > > > > > Bests, > > > > Mingda > > > > > > > > > > > > > -- > Best regards, > Ildar >
