Re: How to do efficient self join with Spark-SQL and Scala

hemant singh Fri, 21 Sep 2018 21:37:56 -0700

You can use spark dataframe 'when' 'otherwise' clause to replace SQL case
statement.


This piece will be required to calculate before -

'select student_id from tbl_student where candidate_id = c.candidate_id and
approval_id = 2
and academic_start_date is null'

Take the count of above DF after joining tbl_student and tbl_network DF's
based on condition above.

Overall you can join all three tables first and rest of the query on the
same dataframe.


On Sat, Sep 22, 2018 at 1:08 AM Chetan Khatri <chetan.opensou...@gmail.com>
wrote:

> Dear Spark Users,
>
> I came across little weird MSSQL Query to replace with Spark and I am like
> no clue how to do it in an efficient way with Scala + SparkSQL. Can someone
> please throw light. I can create view of DataFrame and do it as *spark.sql
> *(query) but I would like to do it with Scala + Spark way.
>
> Sample:
>
>
>
>
>
>
>
>
>
>
>
> *select a.student_id,a.candidate_id, a.student_name, a.student_standard,
> a.student_city, b.teacher_name, a.student_status ,a.approval_id, case when
> a.approval_id = 2 and (a.academic_start_date is nulland not exists (select
> student_id from tbl_student where candidate_id = c.candidate_id and
> approval_id = 2and academic_start_date is null)) then 'Yes'else 'No'end as
> is_currentfrom tbl_student a inner join tbl_teacher b on a.candidate_id =
> b.candidate_id inner join tbl_network con c.candidate_id = a.candidate_id*
>
> Thank you.
>
>

Re: How to do efficient self join with Spark-SQL and Scala

Reply via email to