subject:"Re\: Spark Akka\/actor failures."

Re: Spark Akka/actor failures.

2014-08-14 Thread ldmtwo

The reason we are not using MLLib and Breeze is the lack of control over the data and performance. After computing the covariance matrix, there isn't too much we can do after that. Many of the methods are private. For now, we need the max value and the coresponding pair of columns. Later, we may do

Re: Spark Akka/actor failures.

2014-08-14 Thread Xiangrui Meng

Could you try to map it to row-majored first? Your approach may generate multiple copies of the data. The code should look like this: ~~~ val rows = rdd.map { case (j, values) => values.view.zipWithIndex.map { case (v, i) => (i, (j, v)) } }.groupByKey().map { case (i, entries) => Vectors