Re: Spark SQL joins taking too long

Ted Yu Tue, 26 Jan 2016 14:14:30 -0800

What's the type of shape column ?

Can you disclose what SomeUDF does (by showing the code) ?


Cheers

On Tue, Jan 26, 2016 at 12:41 PM, raghukiran <raghuki...@gmail.com> wrote:

> Hi,
>
> I create two tables, one counties with just one row (it actually has 2k
> rows, but I used only one) and another hospitals, which has 6k rows. The
> join command I use is as follows, which takes way too long to run and has
> never finished successfully (even after nearly 10mins). The following is
> what I have:
>
> DataFrame df1 = ...
> df1.registerTempTable("hospitals");
> DataFrame df2 = ...
> df2.registerTempTable("counties"); //has only one row right now
> DataFrame joinDf = sqlCtx.sql("SELECT h.name, c.name FROM hospitals h JOIN
> counties c ON SomeUDF(c.shape, h.location)");
> long count = joinDf.count(); //this takes too long!
>
> //whereas the following which is the exact equivalent of the above gets
> done
> very quickly!
> DataFrame joinDf = sqlCtx.sql("SELECT h.name FROM hospitals WHERE
> SomeUDF('c.shape as string', h.location)");
> long count = joinDf.count(); //gives me the correct answer of 8
>
> Any suggestions on what I can do to optimize and debug this piece of code?
>
> Regards,
> Raghu
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-joins-taking-too-long-tp26078.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: Spark SQL joins taking too long

Reply via email to