Thanks, Ron. The problem is that the "parser" is written in another package which is not serializable.
In mapreduce, I could create the "parser" in the map setup() method. Now in spark, I want to create it for each worker, and share it among all the tasks on the same work node. I know different workers run on different machine, but it doesn't have to communicate between workers. 2014-08-04 10:51 GMT+08:00 Ron's Yahoo! <zlgonza...@yahoo.com>: > I think you’re going to have to make it serializable by registering it > with the Kryo registrator. I think multiple workers are running as separate > VMs so it might need to be able to serialize and deserialize broadcasted > variables to the different executors. > > Thanks, > Ron > > On Aug 3, 2014, at 6:38 PM, Fengyun RAO <raofeng...@gmail.com> wrote: > > Could anybody help? > > I wonder if I asked a stupid question or I didn't make the question clear? > > > 2014-07-31 21:47 GMT+08:00 Fengyun RAO <raofeng...@gmail.com>: > >> As shown here: >> 2 - Why Is My Spark Job so Slow and Only Using a Single Thread? >> <http://engineering.sharethrough.com/blog/2013/09/13/top-3-troubleshooting-tips-to-keep-you-sparking/> >> >> >> 123456789101112131415 >> >> object JSONParser { def parse(raw: String): String = ...}object >> MyFirstSparkJob { def main(args: Array[String]) { val sc = new >> SparkContext() val lines = sc.textFileStream("beacons.txt") >> lines.map(line => JSONParser.parse(line)) lines.foreach(line => >> println(line)) ssc.start() }} >> >> It says " parser instance is now a singleton created in the scope of our >> driver program" which I thought was in the scope of executor. Am I >> wrong, or why? >> >> What if the parser is not serializable, and I want to share it among >> tasks in the same worker node? >> >> >> >> >> > >