[
https://issues.apache.org/jira/browse/SPARK-18211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15629171#comment-15629171
]
Sean Owen commented on SPARK-18211:
-----------------------------------
Actually, I take that back. I wonder if this is the issue?
https://issues.apache.org/jira/browse/SPARK-18017
> Spark SQL ignores split.size
> ----------------------------
>
> Key: SPARK-18211
> URL: https://issues.apache.org/jira/browse/SPARK-18211
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.0
> Reporter: lostinoverflow
>
> I expect that RDD and DataFrame will have the same number of partitions
> (worked in 1.6) but it looks like Spark SQL ignores Hadoop configuration.
> {code}
> import org.apache.spark.sql.SparkSession
> object App {
> def main(args: Array[String]) {
> val spark = SparkSession
> .builder()
> .master("local[*]")
> .appName("split size")
> .getOrCreate()
> spark.sparkContext.hadoopConfiguration.setInt("mapred.min.split.size",
> args(0).toInt)
> spark.sparkContext.hadoopConfiguration.setInt("mapred.max.split.size",
> args(0).toInt)
> println(spark.sparkContext.textFile(args(1)).partitions.size)
> println(spark.read.textFile(args(1)).rdd.partitions.size)
> spark.stop()
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]