Till 1.3, you have to prepare the DF appropriately def setupCondition(t): if t[1] > 100: v = 1 else: v = 0 return Row(col1=t[0],col2=t[1],col3=t[2],col4=v)
d1=[[1001,100,50],[1001,200,100],[1002,100,99]] d1RDD = sc.parallelize(d1).map(setupCondition) d1DF = ssc.createDataFrame(d1RDD) d1DF.printSchema() d1DF.show() res = d1DF.groupBy("col1").agg({'col3':'min','col4':'sum'}) print "\n\n" res.show() root |-- col1: long (nullable = true) |-- col2: long (nullable = true) |-- col3: long (nullable = true) |-- col4: long (nullable = true) col1 col2 col3 col4 1001 100 50 0 1001 200 100 1 1002 100 99 0 col1 SUM(col4) MIN(col3) 1001 1 50 1002 0 99 Good news is since 1.4, DF will have methods like when,otherwise (and a LOT more)....cant wait to get my hands on 1.4 :) On Wed, May 27, 2015 at 5:12 PM, Masf <masfwo...@gmail.com> wrote: > Yes. I think that your sql solution will work but I was looking for a > solution with DataFrame API. I had thought to use UDF such as: > > val myFunc = udf {(input: Int) => {if (input > 100) 1 else 0}} > > Although I'd like to know if it's possible to do it directly in the > aggregation inserting a lambda function or something else. > > Thanks!!!! > > Regards. > Miguel. > > > On Wed, May 27, 2015 at 1:06 AM, ayan guha <guha.a...@gmail.com> wrote: > >> For this, I can give you a SQL solution: >> >> joined data.registerTempTable('j') >> >> Res=ssc.sql('select col1,col2, count(1) counter, min(col3) minimum, >> sum(case when endrscp>100 then 1 else 0 end test from j' >> >> Let me know if this works. >> On 26 May 2015 23:47, "Masf" <masfwo...@gmail.com> wrote: >> >>> Hi >>> I don't know how it works. For example: >>> >>> val result = joinedData.groupBy("col1","col2").agg( >>> count(lit(1)).as("counter"), >>> min(col3).as("minimum"), >>> sum("case when endrscp> 100 then 1 else 0 end").as("test") >>> ) >>> >>> How can I do it? >>> >>> Thanks!!!! >>> Regards. >>> Miguel. >>> >>> On Tue, May 26, 2015 at 12:35 AM, ayan guha <guha.a...@gmail.com> wrote: >>> >>>> Case when col2>100 then 1 else col2 end >>>> On 26 May 2015 00:25, "Masf" <masfwo...@gmail.com> wrote: >>>> >>>>> Hi. >>>>> >>>>> In a dataframe, How can I execution a conditional sentence in a >>>>> aggregation. For example, Can I translate this SQL statement to >>>>> DataFrame?: >>>>> >>>>> SELECT name, SUM(IF table.col2 > 100 THEN 1 ELSE table.col1) >>>>> FROM table >>>>> GROUP BY name >>>>> >>>>> Thanks!!!! >>>>> -- >>>>> Regards. >>>>> Miguel >>>>> >>>> >>> >>> >>> -- >>> >>> >>> Saludos. >>> Miguel Ángel >>> >> > > > -- > > > Saludos. > Miguel Ángel > -- Best Regards, Ayan Guha