Yeah, actually, I have tested the multiple join on TPCDS dataset (1g) on
UCLA‘s AsterixDB cluster (16 nodes).
Two different join orders as following:

SELECT COUNT(*) FROM (SELECT * FROM catalog_sales cs1 JOIN catalog_returns
cr1 ON (cs1.cs_order_number = cr1.cr_order_number AND cs1.cs_item_sk =
cr1.cr_item_sk))  JOIN inventory i1 ON i1.inv_item_sk = cs1.cs_item_sk;

SELECT COUNT(*) FROM (SELECT * FROM catalog_sales cs1 JOIN inventory i1 ON
cs1.cs_item_sk = i1.inv_item_sk) JOIN catalog_returns cr1 ON
(cs1.cs_order_number = cr1.cr_order_number AND cs1.cs_item_sk =
cr1.cr_item_sk);


*Good join order just needs 2.613  sec while the bad one needs 35.123 sec. *


*I am also building bloom filter and sampler in Hyracks level to collect
stats to try optimize the query on hyracks.*


*Bests,*

*Mingda*

On Mon, Oct 31, 2016 at 1:22 PM, Ildar Absalyamov <
[email protected]> wrote:

> As Yingyi pointed out we don't reorder joins because the framework for
> stats and cardinalities is not there yet.
> However what we can do in an meantime is to provide an interface for the
> statistical information, needed for join reordering, independent of the way
> the stats were collected (either sampling-based or LSM-based) and work out
> details of the cost model.
>
> 2016-10-31 11:30 GMT-07:00 mingda li <[email protected]>:
>
> > Hi Yingyi,
> >
> > I see. Thanks for your reply:-)
> >
> > Bests,
> > Mingda
> >
> >
> > On Mon, Oct 31, 2016 at 11:23 AM, Yingyi Bu <[email protected]> wrote:
> >
> > > Mingda,
> > >
> > >      I'm not sure how much re-ordering can be done at the Hyracks
> level,
> > > i.e., the runtime level.
> > >      In the optimizer (the asterixdb/algebricks level), we don't have
> > > re-ordering for  joins, because:
> > >      --- the cost model has not been added yet.  I'm not sure about the
> > > timeline for this. @Ildar?
> > >      --- respecting user-specified join orders is important for certain
> > > cases, for example, to get stable/predictable performance (zero
> surprise)
> > > for applications.
> > >
> > >      In the runtime, we have a role-reversal optimization in hybrid
> hash
> > > join, which is a safe optimization that is not based on estimations.
> You
> > > can look at OptimizedHybridHashJoin.
> > >
> > > Best,
> > > Yingyi
> > >
> > >
> > > On Mon, Oct 31, 2016 at 11:16 AM, mingda li <[email protected]>
> > > wrote:
> > >
> > > > Dear all,
> > > >
> > > > Hi, I am working on multiple join on Hyracks level. I am not sure if
> I
> > do
> > > > the multiple join on AsterixDB, whether it will optimize the query by
> > > > changing the join order or just execute according to how we write the
> > > > query. I think this may not be done in Algebricks level based on rule
> > but
> > > > not sure.
> > > >
> > > > Bests,
> > > > Mingda
> > > >
> > >
> >
>
>
>
> --
> Best regards,
> Ildar
>

Reply via email to