If there's only one partition, by definition it will only be handled by one executor. Repartition to divide the work up. Note that this will also result in multiple output files, however. If you absolutely need them to be combined into a single file, I suggest using the Unix/Linux 'cat' command to concatenate the files afterwards.
Rich On Sep 22, 2015 9:20 AM, "Ted Yu" <yuzhih...@gmail.com> wrote: > Have you tried using repartition to spread the load ? > > Cheers > > On Sep 22, 2015, at 4:22 AM, Chirag Dewan <chirag.de...@ericsson.com> > wrote: > > Hi, > > > > I am using Spark to access around 300m rows in Cassandra. > > > > My job is pretty simple as I am just mapping my row into a CSV format and > saving it as a text file. > > > > > > public String call(CassandraRow row) > > > throws Exception { > > > StringBuilder sb = new StringBuilder(); > > > sb.append(row.getString(10)); > > > sb.append(","); > > > sb.append(row.getString(11)); > > > sb.append(","); > > > sb.append(row.getString(8)); > > > sb.append(","); > > > sb.append(row.getString(7)); > > > > return > sb.toString(); > > } > > > > My map methods looks like this. > > > > I am having a 3 node cluster. I observe that driver starts on Node A. And > executors are spawned on all 3 nodes. But the executor of Node B or C are > doing all the tasks. It starts a saveasTextFile job with 1 output partition > and stores the RDDs in memory and also commits the file on local file > system. > > > > This executor is using a lot of system memory and CPU while others are > sitting idle. > > > > Am I doing something wrong? Is my RDD correctly partitioned? > > > > Thanks in advance. > > > > > > Chirag > >