I tried turning on the extended debug info. The Scala output is a little
opaque (lots of "- field (class "$iwC$$iwC$$iwC$$iwC$$iwC$$iwC", name:
"$iw", type: "class $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC""), but it seems
like, as expected, somehow the full array of OLSMultipleLinearRegression
objects i
Hey Sandy,
Try using the -Dsun.io.serialization.extendedDebugInfo=true flag on the JVM to
print the contents of the objects. In addition, something else that helps is to
do the following:
{
val _arr = arr
models.map(... _arr ...)
}
Basically, copy the global variable into a local one. The
Sandy,
On Mon, Nov 10, 2014 at 6:01 PM, Sandy Ryza wrote:
>
> The result array is 1867 x 5. It serialized is 80k bytes, which seems
> about right:
> scala> SparkEnv.get.closureSerializer.newInstance().serialize(arr)
> res17: java.nio.ByteBuffer = java.nio.HeapByteBuffer[pos=0 lim=80027
> cap=
I'm experiencing some strange behavior with closure serialization that is
totally mind-boggling to me. It appears that two arrays of equal size take
up vastly different amount of space inside closures if they're generated in
different ways.
The basic flow of my app is to run a bunch of tiny regre