Re: DataFrame Column Alias problem

SLiZn Liu Fri, 22 May 2015 00:31:24 -0700

Despite the odd usage, it does the trick, thanks Reynold!

On Fri, May 22, 2015 at 2:47 PM Reynold Xin <r...@databricks.com> wrote:


> In 1.4 it actually shows col1 by default.
>
> In 1.3, you can add "col1" to the output, i.e.
>
> df.groupBy($"col1").agg($"col1", count($"col1").as("c")).show()
>
>
> On Thu, May 21, 2015 at 11:22 PM, SLiZn Liu <sliznmail...@gmail.com>
> wrote:
>
>> However this returns a single column of c, without showing the original
>> col1.
>> 
>>
>> On Thu, May 21, 2015 at 11:25 PM Ram Sriharsha <sriharsha....@gmail.com>
>> wrote:
>>
>>> df.groupBy($"col1").agg(count($"col1").as("c")).show
>>>
>>> On Thu, May 21, 2015 at 3:09 AM, SLiZn Liu <sliznmail...@gmail.com>
>>> wrote:
>>>
>>>> Hi Spark Users Group,
>>>>
>>>> I’m doing groupby operations on my DataFrame *df* as following, to get
>>>> count for each value of col1:
>>>>
>>>> > df.groupBy("col1").agg("col1" -> "count").show // I don't know if I 
>>>> > should write like this.
>>>> col1   COUNT(col1#347)
>>>> aaa    2
>>>> bbb    4
>>>> ccc    4
>>>> ...
>>>> and more...
>>>>
>>>> As I ‘d like to sort by the resulting count, with
>>>> .sort("COUNT(col1#347)"), but the column name of the count result
>>>> obviously cannot be retrieved in advance. Intuitively one might consider
>>>> acquire column name by column index in a fashion of R’s DataFrame, except
>>>> Spark doesn’t support. I have Googled *spark agg alias* and so forth,
>>>> and checked DataFrame.as in Spark API, neither helped on this. Am I
>>>> the only one who had ever got stuck on this issue or anything I have 
>>>> missed?
>>>>
>>>> REGARDS,
>>>> Todd Leo
>>>> 
>>>>
>>>
>>>
>

Re: DataFrame Column Alias problem

Reply via email to