Re: How to efficiently query a large table with multiple dimensional table?

ashokkumar rajendran Sat, 12 Mar 2016 22:33:07 -0800

Any input on this? Does it have something to do with SQL engine parser /
optimizer? Please help.


Regards
Ashok

On Fri, Mar 11, 2016 at 3:22 PM, ashokkumar rajendran <
ashokkumar.rajend...@gmail.com> wrote:

> Hi All,
>
> I have a large table with few billions of rows and have a very small table
> with 4 dimensional values. I would like to get rows that match any of these
> dimensions. For example,
>
> Select field1, field2 from A, B where A.dimension1 = B.dimension1 OR
> A.dimension2 = B.dimension2 OR A.dimension3 = B.dimension3 OR A.dimension4
> = B.dimension4.
>
> The query plan takes this as nestedLoop and executes for very long time.
>
> If I execute this as Union queries, it takes around 1.5mins for each
> dimension.
>
> Select field1, field2 from A, B where A.dimension1 = B.dimension1
> UNION ALL
> Select field1, field2 from A, B where A.dimension2 = B.dimension2
> UNION ALL
> Select field1, field2 from A, B where  A.dimension3 = B.dimension3
> UNION ALL
> Select field1, field2 from A, B where  A.dimension4 = B.dimension4.
>
> This is obviously not an optimal solution as it makes multiple scanning at
> same table. Is there any other optimal solution for it?
>
> I tried to do this with plain Spark (without SQL) using broadcast map join
> but the performance was bad than this.
>
>
> Regards
> Ashok
>

Re: How to efficiently query a large table with multiple dimensional table?

Reply via email to