FYI.. the problem is that column names spark generates are not able to be
referenced within SQL or dataframe operations (ie. "SUM(cool_cnt#725)")..
any idea how to alias these final aggregate columns..
the syntax below doesn't make sense, but this is what i'd ideally want to
do:
.agg({"cool_cnt":"sum".alias("cool_cnt"),"*":"count".alias("cnt")})
On Wed, Apr 15, 2015 at 7:23 PM, elliott cordo <[email protected]>
wrote:
> Hi Guys -
>
> Having trouble figuring out the semantics for using the alias function on
> the final sum and count aggregations?
>
> >>> cool_summary = reviews.select(reviews.user_id,
> cool_cnt("votes.cool").alias("cool_cnt")).groupBy("user_id").agg({"cool_cnt":"sum","*":"count"})
>
> >>> cool_summary
>
> DataFrame[user_id: string, SUM(cool_cnt#725): double, COUNT(1): bigint]
>