Re: Broadcast error

2014-09-16 Thread Chengi Liu
Cool.. While let me try that.. any other suggestion(s) on things I can try? On Mon, Sep 15, 2014 at 9:59 AM, Davies Liu wrote: > I think the 1.1 will be really helpful for you, it's all compatitble > with 1.0, so it's > not hard to upgrade to 1.1. > > On Mon, Sep 15, 2014 at 2:35 AM, Chengi Liu

Re: Broadcast error

2014-09-15 Thread Davies Liu
I think the 1.1 will be really helpful for you, it's all compatitble with 1.0, so it's not hard to upgrade to 1.1. On Mon, Sep 15, 2014 at 2:35 AM, Chengi Liu wrote: > So.. same result with parallelize (matrix,1000) > with broadcast.. seems like I got jvm core dump :-/ > 4/09/15 02:31:22 INFO Blo

Re: Broadcast error

2014-09-15 Thread Chengi Liu
So.. same result with parallelize (matrix,1000) with broadcast.. seems like I got jvm core dump :-/ 4/09/15 02:31:22 INFO BlockManagerInfo: Registering block manager host:47978 with 19.2 GB RAM 14/09/15 02:31:22 INFO BlockManagerInfo: Registering block manager host:43360 with 19.2 GB RAM Unhandled

Re: Broadcast error

2014-09-15 Thread Akhil Das
Try: rdd = sc.broadcast(matrix) Or rdd = sc.parallelize(matrix,100) // Just increase the number of slices, give it a try. Thanks Best Regards On Mon, Sep 15, 2014 at 2:18 PM, Chengi Liu wrote: > Hi Akhil, > So with your config (specifically with set("spark.akka.frameSize ", > "1000")

Re: Broadcast error

2014-09-15 Thread Chengi Liu
Hi Akhil, So with your config (specifically with set("spark.akka.frameSize ", "1000")) , I see the error: org.apache.spark.SparkException: Job aborted due to stage failure: Serialized task 0:0 was 401970046 bytes which exceeds spark.akka.frameSize (10485760 bytes). Consider using broadcast va

Re: Broadcast error

2014-09-15 Thread Akhil Das
Can you give this a try: conf = SparkConf().set("spark.executor.memory", "32G")*.set("spark.akka.frameSize > ", > "1000").set("spark.broadcast.factory","org.apache.spark.broadcast.TorrentBroadcastFactory")* > sc = SparkContext(conf = conf) > rdd = sc.parallelize(matrix,5) > from pyspark.mllib.

Re: Broadcast error

2014-09-14 Thread Chengi Liu
And the thing is code runs just fine if I reduce the number of rows in my data? On Sun, Sep 14, 2014 at 8:45 PM, Chengi Liu wrote: > I am using spark1.0.2. > This is my work cluster.. so I can't setup a new version readily... > But right now, I am not using broadcast .. > > > conf = SparkConf().

Re: Broadcast error

2014-09-14 Thread Chengi Liu
I am using spark1.0.2. This is my work cluster.. so I can't setup a new version readily... But right now, I am not using broadcast .. conf = SparkConf().set("spark.executor.memory", "32G").set("spark.akka.frameSize", "1000") sc = SparkContext(conf = conf) rdd = sc.parallelize(matrix,5) from pysp

Re: Broadcast error

2014-09-14 Thread Davies Liu
Hey Chengi, What's the version of Spark you are using? It have big improvements about broadcast in 1.1, could you try it? On Sun, Sep 14, 2014 at 8:29 PM, Chengi Liu wrote: > Any suggestions.. I am really blocked on this one > > On Sun, Sep 14, 2014 at 2:43 PM, Chengi Liu wrote: >> >> And when

Re: Broadcast error

2014-09-14 Thread Chengi Liu
Any suggestions.. I am really blocked on this one On Sun, Sep 14, 2014 at 2:43 PM, Chengi Liu wrote: > And when I use sparksubmit script, I get the following error: > > py4j.protocol.Py4JJavaError: An error occurred while calling > o26.trainKMeansModel. > : org.apache.spark.SparkException: Job a

Re: Broadcast error

2014-09-14 Thread Chengi Liu
And when I use sparksubmit script, I get the following error: py4j.protocol.Py4JJavaError: An error occurred while calling o26.trainKMeansModel. : org.apache.spark.SparkException: Job aborted due to stage failure: All masters are unresponsive! Giving up. at org.apache.spark.scheduler.DAGScheduler.

Re: Broadcast error

2014-09-14 Thread Chengi Liu
How? Example please.. Also, if I am running this in pyspark shell.. how do i configure spark.akka.frameSize ?? On Sun, Sep 14, 2014 at 7:43 AM, Akhil Das wrote: > When the data size is huge, you better of use the torrentBroadcastFactory. > > Thanks > Best Regards > > On Sun, Sep 14, 2014 at 2:5

Re: Broadcast error

2014-09-14 Thread Akhil Das
When the data size is huge, you better of use the torrentBroadcastFactory. Thanks Best Regards On Sun, Sep 14, 2014 at 2:54 PM, Chengi Liu wrote: > Specifically the error I see when I try to operate on rdd created by > sc.parallelize method > : org.apache.spark.SparkException: Job aborted due t

Re: Broadcast error

2014-09-14 Thread Chengi Liu
Specifically the error I see when I try to operate on rdd created by sc.parallelize method : org.apache.spark.SparkException: Job aborted due to stage failure: Serialized task 12:12 was 12062263 bytes which exceeds spark.akka.frameSize (10485760 bytes). Consider using broadcast variables for large