Re: Count distinct with groupBy usage

2014-07-15 Thread Raffael Marty
> All I'm attempting is to report number of unique visitors per page by date. But the way you are doing it currently, you will get a count per second. You have to bucketize your dates by whatever time resolution you want. -raffy

SparkSQL - Partitioned Parquet

2014-07-06 Thread Raffael Marty
Does SparkSQL support partitioned parquet tables? How do I save to a partitioned parquet file from within Python? table.saveAsParquetFile("table.parquet”) This call doesn’t seem to support a partition argument. Or does my schemaRDD have to be setup a specific way?