[DataFrame] partitionBy issues

2015-06-23 Thread vladio
Hi, I'm running into a strange memory scaling issue when using the partitionBy feature of DataFrameWriter. I've generated a table (a CSV file) with 3 columns (A, B and C) and 32*32 different entries, with size on disk of about 20kb. There are 32 distinct values for column A and 32 distinct values

Re: [DataFrame] partitionBy issues

2015-06-30 Thread vladio
https://issues.apache.org/jira/browse/SPARK-8597 A JIRA ticket discussing the same problem (with more insights than here)! -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/DataFrame-partitionBy-issues-tp12838p12974.html Sent from the Apache Spark Devel