FYI.. the problem is that column names spark generates are not able to be referenced within SQL or dataframe operations (ie. "SUM(cool_cnt#725)").. any idea how to alias these final aggregate columns..
the syntax below doesn't make sense, but this is what i'd ideally want to do: .agg({"cool_cnt":"sum".alias("cool_cnt"),"*":"count".alias("cnt")}) On Wed, Apr 15, 2015 at 7:23 PM, elliott cordo <elliottco...@gmail.com> wrote: > Hi Guys - > > Having trouble figuring out the semantics for using the alias function on > the final sum and count aggregations? > > >>> cool_summary = reviews.select(reviews.user_id, > cool_cnt("votes.cool").alias("cool_cnt")).groupBy("user_id").agg({"cool_cnt":"sum","*":"count"}) > > >>> cool_summary > > DataFrame[user_id: string, SUM(cool_cnt#725): double, COUNT(1): bigint] >