Hi Ted,
Any chance to develop more on the SQLConf parameters in the sense to have more
explanations for changing these settings?
Not all of them are made clear in the descriptions.
Thanks!
Best,
Ovidiu
> On 31 May 2016, at 16:30, Ted Yu wrote:
>
> Maciej:
> You can refer to the doc in
> sql/
If you don't hesitate the newest version, you try to use v2.0-preview.
http://spark.apache.org/news/spark-2.0.0-preview.html
There, you can control #partitions for input partitions without shuffles by
two parameters below;
spark.sql.files.maxPartitionBytes
spark.sql.files.openCostInBytes
( Not doc
Maciej:
You can refer to the doc in
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
for these parameters.
On Tue, May 31, 2016 at 7:27 AM, Takeshi Yamamuro
wrote:
> If you don't hesitate the newest version, you try to use v2.0-preview.
> http://spark.apache.org/news/spark-2.0
Thanks.
At what conditions number of partitions can be higher than minPartitions
when reading textFile? Should this be considered as unfrequent situation?
To sum up - is there more efficient way to ensure exact number of
partitions than following:
rdd = sc.textFile("perf_test1.csv", minPartition
`coalesce` without shuffling can only set fewer partitions than its parent
RDD.
As Ted said, you need to set true in shuffle, or you need to use
`RDD#repartition`.
// maropu
On Tue, May 31, 2016 at 11:02 PM, Ted Yu wrote:
> Value for shuffle is false by default.
>
> Have you tried setting it t
After setting shuffle to true I get expected 128 partitions, but I'm
worried about performance of such solution - especially I see that some
shuffling is done because size of partitions chages:
scala> sc.textFile("hdfs:///proj/dFAB_test/testdata/perf_test1.csv",
minPartitions=128).coalesce(128, tr
Value for shuffle is false by default.
Have you tried setting it to true ?
Which Spark release are you using ?
On Tue, May 31, 2016 at 6:13 AM, Maciej Sokołowski
wrote:
> Hello Spark users and developers.
>
> I read file and want to ensure that it has exact number of partitions, for
> example
Hello Spark users and developers.
I read file and want to ensure that it has exact number of partitions, for
example 128.
In documentation I found:
def textFile(path: String, minPartitions: Int = defaultMinPartitions):
RDD[String]
But argument here is minimal number of partitions, so I use coal
Hello Spark users and developers.
I read file and want to ensure that it has exact number of partitions, for
example 128.
In documentation I found:
def textFile(path: String, minPartitions: Int = defaultMinPartitions):
RDD[String]
But argument here is minimal number of partitions, so I use coal