Thanks, Ron.

The problem is that the "parser" is written in another package which is not
serializable.

In mapreduce, I could create the "parser" in the map setup() method.

Now in spark, I want to create it for each worker, and share it among all
the tasks on the same work node.

I know different workers run on different machine, but it doesn't have to
communicate between workers.



2014-08-04 10:51 GMT+08:00 Ron's Yahoo! <zlgonza...@yahoo.com>:

> I think you’re going to have to make it serializable by registering it
> with the Kryo registrator. I think multiple workers are running as separate
> VMs so it might need to be able to serialize and deserialize broadcasted
> variables to the different executors.
>
> Thanks,
> Ron
>
> On Aug 3, 2014, at 6:38 PM, Fengyun RAO <raofeng...@gmail.com> wrote:
>
> Could anybody help?
>
> I wonder if I asked a stupid question or I didn't make the question clear?
>
>
> 2014-07-31 21:47 GMT+08:00 Fengyun RAO <raofeng...@gmail.com>:
>
>> As shown here:
>> 2 - Why Is My Spark Job so Slow and Only Using a Single Thread?
>> <http://engineering.sharethrough.com/blog/2013/09/13/top-3-troubleshooting-tips-to-keep-you-sparking/>
>>
>>
>>  123456789101112131415
>>
>> object JSONParser {  def parse(raw: String): String = ...}object 
>> MyFirstSparkJob {  def main(args: Array[String]) {    val sc = new 
>> SparkContext()    val lines = sc.textFileStream("beacons.txt")    
>> lines.map(line => JSONParser.parse(line))    lines.foreach(line => 
>> println(line))    ssc.start()  }}
>>
>> It says " parser instance is now a singleton created in the scope of our
>> driver program" which I thought was in the scope of executor. Am I
>> wrong, or why?
>>
>> What if the parser is not serializable, and I want to share it among
>> tasks in the same worker node?
>>
>>
>>
>>
>>
>
>

Reply via email to