Hm, this means these vectors carry around a second representation, which is
probably too costly from a memory perspective. Can the caller not just
construct these as needed? while constructing it takes time, it seems like
that's still a win, and doesn't impact the rest of the code.
On Wed, Jul 4,
Can someone please suggest me , thanks
On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri,
wrote:
> Hello Dear Spark User / Dev,
>
> I would like to pass Python user defined function to Spark Job developed
> using Scala and return value of that function would be returned to DF /
> Dataset API.
>
> Can so
Hi Sean, I think the simplest way is to return a *breeze.linalg.HashVector*
when *org.apache.spark.ml.linalg.SparseVector#asBreeze* is called, and use
a lazy value to store that vector because the construction of
*breeze.linalg.HashVector* has some extra performance cost.
The code will be like
cl