[jira] [Commented] (SPARK-18211) Spark SQL ignores split.size

lostinoverflow (JIRA) Wed, 02 Nov 2016 07:38:26 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-18211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15629156#comment-15629156
 ]


lostinoverflow commented on SPARK-18211:
----------------------------------------

Thank you for the prompt response. Do you suggest that something changed 
between 1.6 and 2.0? spark.read.text(args(1)).rdd.partitions.size used to work. 
And if it doesn't suppose to work do you have any hints which (if any) 
parameter may affect this now?

> Spark SQL ignores split.size
> ----------------------------
>
>                 Key: SPARK-18211
>                 URL: https://issues.apache.org/jira/browse/SPARK-18211
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: lostinoverflow
>
> I expect that RDD and DataFrame will have the same number of partitions 
> (worked in 1.6) but it looks like Spark SQL ignores Hadoop configuration.
> {code}
> import org.apache.spark.sql.SparkSession
> object App {
>   def main(args: Array[String]) {
>     val spark = SparkSession
>       .builder()
>       .master("local[*]")
>       .appName("split size")
>       .getOrCreate()
>     spark.sparkContext.hadoopConfiguration.setInt("mapred.min.split.size", 
> args(0).toInt)
>     spark.sparkContext.hadoopConfiguration.setInt("mapred.max.split.size", 
> args(0).toInt)
>     println(spark.sparkContext.textFile(args(1)).partitions.size)
>     println(spark.read.textFile(args(1)).rdd.partitions.size)
>     spark.stop()
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-18211) Spark SQL ignores split.size

Reply via email to