As a template for creating a broadcast variable, the following code snippet
within mllib was used:
val bcIdf = dataset.context.broadcast(idf)
dataset.mapPartitions { iter =>
val thisIdf = bcIdf.value
The new code follows that model:
import org.apache.spark.mllib.linalg.{Vector => MVector}
..
assert(crows.isInstanceOf[Array[MVector]])
val bcRows = sc.broadcast(crows)
val GU = mat.rows.zipWithIndex.mapPartitions { case dataIter =>
val arrayVect = bcRows.value // bcRows.value is seen in
debugger to be of type Array[Byte] .. ??
That last line is unhappy:
java.lang.ClassCastException: [B cannot be cast to
[Lorg.apache.spark.mllib.linalg.Vector;
So the compiler is aware that the return type of the broadcast "value"
method should be an array of vector (which it should). However the actual
type is Array[Byte]. Any insights on this?