maybe you mean different spark-submit script? we also use the same spark-submit script, thus the same memory, cores, etc configuration.
2015-01-21 15:45 GMT+08:00 Sean Owen <so...@cloudera.com>: > I don't know of any reason to think the singleton pattern doesn't work or > works differently. I wonder if, for example, task scheduling is different > in 1.2 and you have more partitions across more workers and so are loading > more copies more slowly into your singletons. > On Jan 21, 2015 7:13 AM, "Fengyun RAO" <raofeng...@gmail.com> wrote: > >> the LogParser instance is not serializable, and thus cannot be a >> broadcast, >> >> what’s worse, it contains an LRU cache, which is essential to the >> performance, and we would like to share among all the tasks on the same >> node. >> >> If it is the case, what’s the recommended way to share a variable among >> all the tasks within the same executor. >> >> >> 2015-01-21 15:04 GMT+08:00 Davies Liu <dav...@databricks.com>: >> >>> Maybe some change related to serialize the closure cause LogParser is >>> not a singleton any more, then it is initialized for every task. >>> >>> Could you change it to a Broadcast? >>> >>> On Tue, Jan 20, 2015 at 10:39 PM, Fengyun RAO <raofeng...@gmail.com> >>> wrote: >>> > Currently we are migrating from spark 1.1 to spark 1.2, but found the >>> > program 3x slower, with nothing else changed. >>> > note: our program in spark 1.1 has successfully processed a whole year >>> data, >>> > quite stable. >>> > >>> > the main script is as below >>> > >>> > sc.textFile(inputPath) >>> > .flatMap(line => LogParser.parseLine(line)) >>> > .groupByKey(new HashPartitioner(numPartitions)) >>> > .mapPartitionsWithIndex(...) >>> > .foreach(_ => {}) >>> > >>> > where LogParser is a singleton which may take some time to initialized >>> and >>> > is shared across the execuator. >>> > >>> > the flatMap stage is 3x slower. >>> > >>> > We tried to change spark.shuffle.manager back to hash, and >>> > spark.shuffle.blockTransferService back to nio, but didn’t help. >>> > >>> > May somebody explain possible causes, or what should we test or change >>> to >>> > find it out >>> >> >>