Any input on this? Does it have something to do with SQL engine parser / optimizer? Please help.
Regards Ashok On Fri, Mar 11, 2016 at 3:22 PM, ashokkumar rajendran < ashokkumar.rajend...@gmail.com> wrote: > Hi All, > > I have a large table with few billions of rows and have a very small table > with 4 dimensional values. I would like to get rows that match any of these > dimensions. For example, > > Select field1, field2 from A, B where A.dimension1 = B.dimension1 OR > A.dimension2 = B.dimension2 OR A.dimension3 = B.dimension3 OR A.dimension4 > = B.dimension4. > > The query plan takes this as nestedLoop and executes for very long time. > > If I execute this as Union queries, it takes around 1.5mins for each > dimension. > > Select field1, field2 from A, B where A.dimension1 = B.dimension1 > UNION ALL > Select field1, field2 from A, B where A.dimension2 = B.dimension2 > UNION ALL > Select field1, field2 from A, B where A.dimension3 = B.dimension3 > UNION ALL > Select field1, field2 from A, B where A.dimension4 = B.dimension4. > > This is obviously not an optimal solution as it makes multiple scanning at > same table. Is there any other optimal solution for it? > > I tried to do this with plain Spark (without SQL) using broadcast map join > but the performance was bad than this. > > > Regards > Ashok >