Currently we are migrating from spark 1.1 to spark 1.2, but found the program 3x slower, with nothing else changed. note: our program in spark 1.1 has successfully processed a whole year data, quite stable.
the main script is as below sc.textFile(inputPath) .flatMap(line => LogParser.parseLine(line)) .groupByKey(new HashPartitioner(numPartitions)) .mapPartitionsWithIndex(...) .foreach(_ => {}) where LogParser is a singleton which may take some time to initialized and is shared across the execuator. the flatMap stage is 3x slower. We tried to change spark.shuffle.manager back to hash, and spark.shuffle.blockTransferService back to nio, but didn’t help. May somebody explain possible causes, or what should we test or change to find it out