答复: GroupBy in Spark / Scala without Agg functions

Linyuxin Tue, 29 May 2018 17:52:25 -0700

Hi,
Why not group by first then join?
BTW, I don’t think there any difference between ‘distinct’ and ‘group by’


Source code of 2.1:
def distinct(): Dataset[T] = dropDuplicates()
…
def dropDuplicates(colNames: Seq[String]): Dataset[T] = withTypedPlan {
…
Aggregate(groupCols, aggCols, logicalPlan)
}




发件人: Chetan Khatri [mailto:[email protected]]
发送时间: 2018年5月30日 2:52
收件人: Irving Duran <[email protected]>
抄送: Georg Heiler <[email protected]>; user <[email protected]>
主题: Re: GroupBy in Spark / Scala without Agg functions

Georg, Sorry for dumb question. Help me to understand - if i do 
DF.select(A,B,C,D).distinct() that would be same as above groupBy without agg 
in sql right ?

On Wed, May 30, 2018 at 12:17 AM, Chetan Khatri 
<[email protected]<mailto:[email protected]>> wrote:
I don't want to get any aggregation, just want to know rather saying distinct 
to all columns any other better approach ?

On Wed, May 30, 2018 at 12:16 AM, Irving Duran 
<[email protected]<mailto:[email protected]>> wrote:
Unless you want to get a count, yes.

Thank You,

Irving Duran


On Tue, May 29, 2018 at 1:44 PM Chetan Khatri 
<[email protected]<mailto:[email protected]>> wrote:
Georg, I just want to double check that someone wrote MSSQL Server script where 
it's groupby all columns. What is alternate best way to do distinct all columns 
?



On Wed, May 30, 2018 at 12:08 AM, Georg Heiler 
<[email protected]<mailto:[email protected]>> wrote:
Why do you group if you do not want to aggregate?
Isn't this the same as select distinct?

Chetan Khatri <[email protected]<mailto:[email protected]>> 
schrieb am Di., 29. Mai 2018 um 20:21 Uhr:
All,

I have scenario like this in MSSQL Server SQL where i need to do groupBy 
without Agg function:

Pseudocode:


select m.student_id, m.student_name, m.student_std, m.student_group, m.student_d
ob from student as m inner join general_register g on m.student_id = g.student_i
d group by m.student_id, m.student_name, m.student_std, m.student_group, 
m.student_dob

I tried to doing in spark but i am not able to get Dataframe as return value, 
how this kind of things could be done in Spark.

Thanks

答复: GroupBy in Spark / Scala without Agg functions

Reply via email to