Currently we are migrating from spark 1.1 to spark 1.2, but found the
program 3x slower, with nothing else changed.
note: our program in spark 1.1 has successfully processed a whole year
data, quite stable.

the main script is as below

sc.textFile(inputPath)
.flatMap(line => LogParser.parseLine(line))
.groupByKey(new HashPartitioner(numPartitions))
.mapPartitionsWithIndex(...)
.foreach(_ => {})

where LogParser is a singleton which may take some time to initialized and
is shared across the execuator.

the flatMap stage is 3x slower.

We tried to change spark.shuffle.manager back to hash, and
spark.shuffle.blockTransferService back to nio, but didn’t help.

May somebody explain possible causes, or what should we test or change to
find it out
​

Reply via email to