The reason we are not using MLLib and Breeze is the lack of control over the
data and performance. After computing the covariance matrix, there isn't too
much we can do after that. Many of the methods are private. For now, we need
the max value and the coresponding pair of columns. Later, we may do
Could you try to map it to row-majored first? Your approach may
generate multiple copies of the data. The code should look like this:
~~~
val rows = rdd.map { case (j, values) =>
values.view.zipWithIndex.map { case (v, i) =>
(i, (j, v))
}
}.groupByKey().map { case (i, entries) =>
Vectors