Hi Michael,
Thanks for sharing the tip. It will help to the write path of the partitioned
table.
Do you have similar suggestion on reading the partitioned table back when there
is a million of distinct values on the partition field (for example on user
id)? Last time I have trouble to read a p
See here for some workarounds:
https://issues.apache.org/jira/browse/SPARK-12546
On Thu, Jan 14, 2016 at 6:46 PM, Jerry Lam wrote:
> Hi Arkadiusz,
>
> the partitionBy is not designed to have many distinct value the last time
> I used it. If you search in the mailing list, I think there are coupl
Hi Arkadiusz,
the partitionBy is not designed to have many distinct value the last time I
used it. If you search in the mailing list, I think there are couple of
people also face similar issues. For example, in my case, it won't work
over a million distinct user ids. It will require a lot of memor