Hi All,

The code: RangePartitioner

 // This is the sample size we need to have roughly balanced output
partitions, capped at 1M.

      val sampleSize = math.min(20.0 * partitions, 1e6)

      // Assume the input partitions are roughly balanced and over-sample a
little bit.

      val sampleSizePerPartition = math.ceil(3.0 * sampleSize /
rdd.partitions.length).toInt


The Constants : 20.0 and 3.0 It is hardcode. Why is it fixed?

Is it come from some white paper or research?


Regards

-Raintung Li

Reply via email to