Till 1.3, you have to prepare the DF appropriately

def setupCondition(t):
    if t[1] > 100:
        v = 1
    else:
        v = 0
    return Row(col1=t[0],col2=t[1],col3=t[2],col4=v)


 d1=[[1001,100,50],[1001,200,100],[1002,100,99]]
    d1RDD = sc.parallelize(d1).map(setupCondition)
    d1DF = ssc.createDataFrame(d1RDD)
    d1DF.printSchema()
    d1DF.show()
    res = d1DF.groupBy("col1").agg({'col3':'min','col4':'sum'})
    print "\n\n"
    res.show()

root
 |-- col1: long (nullable = true)
 |-- col2: long (nullable = true)
 |-- col3: long (nullable = true)
 |-- col4: long (nullable = true)

col1 col2 col3 col4
1001 100  50   0
1001 200  100  1
1002 100  99   0



col1 SUM(col4) MIN(col3)
1001 1         50
1002 0         99

Good news is since 1.4, DF will have methods like when,otherwise (and a LOT
more)....cant wait to get my hands on 1.4 :)




On Wed, May 27, 2015 at 5:12 PM, Masf <masfwo...@gmail.com> wrote:

> Yes. I think that your sql solution will work but I was looking for a
> solution with DataFrame API. I had thought to use UDF such as:
>
> val myFunc = udf {(input: Int) => {if (input > 100) 1 else 0}}
>
> Although I'd like to know if it's possible to do it directly in the
> aggregation inserting a lambda function or something else.
>
> Thanks!!!!
>
> Regards.
> Miguel.
>
>
> On Wed, May 27, 2015 at 1:06 AM, ayan guha <guha.a...@gmail.com> wrote:
>
>> For this, I can give you a SQL solution:
>>
>> joined data.registerTempTable('j')
>>
>> Res=ssc.sql('select col1,col2, count(1) counter, min(col3) minimum,
>> sum(case when endrscp>100 then 1 else 0 end test from j'
>>
>> Let me know if this works.
>> On 26 May 2015 23:47, "Masf" <masfwo...@gmail.com> wrote:
>>
>>> Hi
>>> I don't know how it works. For example:
>>>
>>> val result = joinedData.groupBy("col1","col2").agg(
>>>   count(lit(1)).as("counter"),
>>>   min(col3).as("minimum"),
>>>   sum("case when endrscp> 100 then 1 else 0 end").as("test")
>>> )
>>>
>>> How can I do it?
>>>
>>> Thanks!!!!
>>> Regards.
>>> Miguel.
>>>
>>> On Tue, May 26, 2015 at 12:35 AM, ayan guha <guha.a...@gmail.com> wrote:
>>>
>>>> Case when col2>100 then 1 else col2 end
>>>> On 26 May 2015 00:25, "Masf" <masfwo...@gmail.com> wrote:
>>>>
>>>>> Hi.
>>>>>
>>>>> In a dataframe, How can I execution a conditional sentence in a
>>>>> aggregation. For example, Can I translate this SQL statement to 
>>>>> DataFrame?:
>>>>>
>>>>> SELECT name, SUM(IF table.col2 > 100 THEN 1 ELSE table.col1)
>>>>> FROM table
>>>>> GROUP BY name
>>>>>
>>>>> Thanks!!!!
>>>>> --
>>>>> Regards.
>>>>> Miguel
>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>>
>>> Saludos.
>>> Miguel Ángel
>>>
>>
>
>
> --
>
>
> Saludos.
> Miguel Ángel
>



-- 
Best Regards,
Ayan Guha

Reply via email to