Dear list,
I am trying to calculate sum and count on the same column:
user_id_books_clicks =
(sqlContext.read.parquet('hdfs:///projects/kaggle-expedia/input/train.parquet')
.groupby('user_id')
.agg({'is_booking':'count',
'is_booking':'sum'})
.orderBy(fn.desc('count(user_id)'))
.cache()
)
If I do it like that, it only gives me one (last) aggregate -
sum(is_booking)
But if I change to .agg({'user_id':'count', 'is_booking':'sum'}) - it
gives me both. I am on 1.6.1. Is it fixed in 2.+? Or should I report it to
JIRA?