Hi,
I'm running into a strange memory scaling issue when using the partitionBy
feature of DataFrameWriter.
I've generated a table (a CSV file) with 3 columns (A, B and C) and 32*32
different entries, with size on disk of about 20kb. There are 32 distinct
values for column A and 32 distinct values
https://issues.apache.org/jira/browse/SPARK-8597
A JIRA ticket discussing the same problem (with more insights than here)!
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/DataFrame-partitionBy-issues-tp12838p12974.html
Sent from the Apache Spark Devel