will involve shuffling.
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Tuesday, April 26, 2016 2:44 PM
To: Ulanov, Alexander
Cc: dev@spark.apache.org
Subject: Re: Number of partitions for binaryFiles
From what I understand, Spark code was written this way because you don't end
up with very
esday, April 26, 2016 1:22 PM
> *To:* Ulanov, Alexander
> *Cc:* dev@spark.apache.org
> *Subject:* Re: Number of partitions for binaryFiles
>
>
>
> Here is the body of StreamFileInputFormat#setMinPartitions :
>
>
>
> def setMinPartitions(context: JobContext, minParti
: Number of partitions for binaryFiles
Here is the body of StreamFileInputFormat#setMinPartitions :
def setMinPartitions(context: JobContext, minPartitions: Int) {
val totalLen =
listStatus(context).asScala.filterNot(_.isDirectory).map(_.getLen).sum
val maxSplitSize = math.ceil(totalLen
Here is the body of StreamFileInputFormat#setMinPartitions :
def setMinPartitions(context: JobContext, minPartitions: Int) {
val totalLen =
listStatus(context).asScala.filterNot(_.isDirectory).map(_.getLen).sum
val maxSplitSize = math.ceil(totalLen / math.max(minPartitions,
1.0)).toLong