Re: news20-binary classification with LogisticRegressionWithSGD

Xiangrui Meng Thu, 19 Jun 2014 07:44:34 -0700

It is because the frame size is not set correctly in executor backend. see 
spark-1112 . We are going to fix it in v1.0.1 . Did you try the treeAggregate?


> On Jun 19, 2014, at 2:01 AM, Makoto Yui <yuin...@gmail.com> wrote:
> 
> Xiangrui and Debasish,
> 
> (2014/06/18 6:33), Debasish Das wrote:
>> I did run pretty big sparse dataset (20M rows, 3M sparse features) and I
>> got 100 iterations of SGD running in 200 seconds...10 executors each
>> with 16 GB memory...
> 
> I could figure out what the problem is. "spark.akka.frameSize" was too large. 
> By setting spark.akka.frameSize=10, it worked for the news20 dataset.
> 
> The execution was slow for more large KDD cup 2012, Track 2 dataset (235M+ 
> records of 16.7M+ (2^24) sparse features in about 33.6GB) due to the 
> sequential aggregation of dense vectors on a single driver node.
> 
> It took about 7.6m for aggregation for an iteration.
> 
> Thanks,
> Makoto

Re: news20-binary classification with LogisticRegressionWithSGD

Reply via email to