FYI.. the problem is that column names spark generates are not able to be
referenced within SQL or dataframe operations (ie. "SUM(cool_cnt#725)")..
any idea how to alias these final aggregate columns..

the syntax below doesn't make sense, but this is what i'd ideally want to
do:
.agg({"cool_cnt":"sum".alias("cool_cnt"),"*":"count".alias("cnt")})

On Wed, Apr 15, 2015 at 7:23 PM, elliott cordo <elliottco...@gmail.com>
wrote:

> Hi Guys -
>
> Having trouble figuring out the semantics for using the alias function on
> the final sum and count aggregations?
>
> >>> cool_summary = reviews.select(reviews.user_id,
> cool_cnt("votes.cool").alias("cool_cnt")).groupBy("user_id").agg({"cool_cnt":"sum","*":"count"})
>
> >>> cool_summary
>
> DataFrame[user_id: string, SUM(cool_cnt#725): double, COUNT(1): bigint]
>

Reply via email to