Thanks Jimmy.
I will have a try.
Thanks very much for your guys' help.
Best,
Peng
On Thu, Oct 30, 2014 at 8:19 PM, Jimmy wrote:
> sampleRDD. cache()
>
> Sent from my iPhone
>
> On Oct 30, 2014, at 5:01 PM, peng xia wrote:
>
> Hi Xiangrui,
>
> Can you give me some code example about caching, as
sampleRDD. cache()
Sent from my iPhone
> On Oct 30, 2014, at 5:01 PM, peng xia wrote:
>
> Hi Xiangrui,
>
> Can you give me some code example about caching, as I am new to Spark.
>
> Thanks,
> Best,
> Peng
>
>> On Thu, Oct 30, 2014 at 6:57 PM, Xiangrui Meng wrote:
>> Then caching should sol
Hi Xiangrui,
Can you give me some code example about caching, as I am new to Spark.
Thanks,
Best,
Peng
On Thu, Oct 30, 2014 at 6:57 PM, Xiangrui Meng wrote:
> Then caching should solve the problem. Otherwise, it is just loading
> and parsing data from disk for each iteration. -Xiangrui
>
> On
Then caching should solve the problem. Otherwise, it is just loading
and parsing data from disk for each iteration. -Xiangrui
On Thu, Oct 30, 2014 at 11:44 AM, peng xia wrote:
> Thanks for all your help.
> I think I didn't cache the data. My previous cluster was expired and I don't
> have a chanc
Thanks for all your help.
I think I didn't cache the data. My previous cluster was expired and I
don't have a chance to check the load balance or app manager.
Below is my code.
There are 18 features for each record and I am using the Scala API.
import org.apache.spark.SparkConf
import org.apache.s
DId you cache the data and check the load balancing? How many
features? Which API are you using, Scala, Java, or Python? -Xiangrui
On Thu, Oct 30, 2014 at 9:13 AM, Jimmy wrote:
> Watch the app manager it should tell you what's running and taking awhile...
> My guess it's a "distinct" function on
Watch the app manager it should tell you what's running and taking awhile... My
guess it's a "distinct" function on the data.
J
Sent from my iPhone
> On Oct 30, 2014, at 8:22 AM, peng xia wrote:
>
> Hi,
>
>
>
> Previous we have applied SVM algorithm in MLlib to 5 million records (600
> mb
Hi,
Previous we have applied SVM algorithm in MLlib to 5 million records (600
mb), it takes more than 25 minutes to finish.
The spark version we are using is 1.0 and we were running this program on a
4 nodes cluster. Each node has 4 cpu cores and 11 GB RAM.
The 5 million records only have two d