subject:"Re\: SPARK\-8813 \- combining small files in spark sql"

Re: SPARK-8813 - combining small files in spark sql

2016-07-07 Thread Reynold Xin

When using native data sources (e.g. Parquet, ORC, JSON, ...), partitions are automatically merged so they would add up to a specific size, configurable by spark.sql.files.maxPartitionBytes. spark.sql.files.openCostInBytes is used to specify the cost of each "file". That is, an empty file will be

Re: SPARK-8813 - combining small files in spark sql

2016-07-07 Thread Sean Owen

-user Reynold made the comment that he thinks this was resolved by another change; maybe he can comment. On Thu, Jul 7, 2016 at 7:53 AM, Ajay Srivastava wrote: > Hi, > > This jira https://issues.apache.org/jira/browse/SPARK-8813 is fixed in spark > 2.0. > But resolution is not mentioned there. >