Cool.. While let me try that.. any other suggestion(s) on things I can try?
On Mon, Sep 15, 2014 at 9:59 AM, Davies Liu wrote:
> I think the 1.1 will be really helpful for you, it's all compatitble
> with 1.0, so it's
> not hard to upgrade to 1.1.
>
> On Mon, Sep 15, 2014 at 2:35 AM, Chengi Liu
I think the 1.1 will be really helpful for you, it's all compatitble
with 1.0, so it's
not hard to upgrade to 1.1.
On Mon, Sep 15, 2014 at 2:35 AM, Chengi Liu wrote:
> So.. same result with parallelize (matrix,1000)
> with broadcast.. seems like I got jvm core dump :-/
> 4/09/15 02:31:22 INFO Blo
So.. same result with parallelize (matrix,1000)
with broadcast.. seems like I got jvm core dump :-/
4/09/15 02:31:22 INFO BlockManagerInfo: Registering block manager
host:47978 with 19.2 GB RAM
14/09/15 02:31:22 INFO BlockManagerInfo: Registering block manager
host:43360 with 19.2 GB RAM
Unhandled
Try:
rdd = sc.broadcast(matrix)
Or
rdd = sc.parallelize(matrix,100) // Just increase the number of slices,
give it a try.
Thanks
Best Regards
On Mon, Sep 15, 2014 at 2:18 PM, Chengi Liu wrote:
> Hi Akhil,
> So with your config (specifically with set("spark.akka.frameSize ",
> "1000")
Hi Akhil,
So with your config (specifically with set("spark.akka.frameSize ",
"1000")) , I see the error:
org.apache.spark.SparkException: Job aborted due to stage failure:
Serialized task 0:0 was 401970046 bytes which exceeds spark.akka.frameSize
(10485760 bytes). Consider using broadcast va
Can you give this a try:
conf = SparkConf().set("spark.executor.memory",
"32G")*.set("spark.akka.frameSize
> ",
> "1000").set("spark.broadcast.factory","org.apache.spark.broadcast.TorrentBroadcastFactory")*
> sc = SparkContext(conf = conf)
> rdd = sc.parallelize(matrix,5)
> from pyspark.mllib.
And the thing is code runs just fine if I reduce the number of rows in my
data?
On Sun, Sep 14, 2014 at 8:45 PM, Chengi Liu wrote:
> I am using spark1.0.2.
> This is my work cluster.. so I can't setup a new version readily...
> But right now, I am not using broadcast ..
>
>
> conf = SparkConf().
I am using spark1.0.2.
This is my work cluster.. so I can't setup a new version readily...
But right now, I am not using broadcast ..
conf = SparkConf().set("spark.executor.memory",
"32G").set("spark.akka.frameSize", "1000")
sc = SparkContext(conf = conf)
rdd = sc.parallelize(matrix,5)
from pysp
Hey Chengi,
What's the version of Spark you are using? It have big improvements
about broadcast in 1.1, could you try it?
On Sun, Sep 14, 2014 at 8:29 PM, Chengi Liu wrote:
> Any suggestions.. I am really blocked on this one
>
> On Sun, Sep 14, 2014 at 2:43 PM, Chengi Liu wrote:
>>
>> And when
Any suggestions.. I am really blocked on this one
On Sun, Sep 14, 2014 at 2:43 PM, Chengi Liu wrote:
> And when I use sparksubmit script, I get the following error:
>
> py4j.protocol.Py4JJavaError: An error occurred while calling
> o26.trainKMeansModel.
> : org.apache.spark.SparkException: Job a
And when I use sparksubmit script, I get the following error:
py4j.protocol.Py4JJavaError: An error occurred while calling
o26.trainKMeansModel.
: org.apache.spark.SparkException: Job aborted due to stage failure: All
masters are unresponsive! Giving up.
at org.apache.spark.scheduler.DAGScheduler.
How? Example please..
Also, if I am running this in pyspark shell.. how do i configure
spark.akka.frameSize ??
On Sun, Sep 14, 2014 at 7:43 AM, Akhil Das
wrote:
> When the data size is huge, you better of use the torrentBroadcastFactory.
>
> Thanks
> Best Regards
>
> On Sun, Sep 14, 2014 at 2:5
When the data size is huge, you better of use the torrentBroadcastFactory.
Thanks
Best Regards
On Sun, Sep 14, 2014 at 2:54 PM, Chengi Liu wrote:
> Specifically the error I see when I try to operate on rdd created by
> sc.parallelize method
> : org.apache.spark.SparkException: Job aborted due t
Specifically the error I see when I try to operate on rdd created by
sc.parallelize method
: org.apache.spark.SparkException: Job aborted due to stage failure:
Serialized task 12:12 was 12062263 bytes which exceeds spark.akka.frameSize
(10485760 bytes). Consider using broadcast variables for large
14 matches
Mail list logo