Re: [SparkML] Random access in SparseVector will slow down inference stage for some tree based models

2018-07-04 Thread Sean Owen
Hm, this means these vectors carry around a second representation, which is probably too costly from a memory perspective. Can the caller not just construct these as needed? while constructing it takes time, it seems like that's still a win, and doesn't impact the rest of the code. On Wed, Jul 4,

Re: Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-04 Thread Chetan Khatri
Can someone please suggest me , thanks On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, wrote: > Hello Dear Spark User / Dev, > > I would like to pass Python user defined function to Spark Job developed > using Scala and return value of that function would be returned to DF / > Dataset API. > > Can so

Re: [SparkML] Random access in SparseVector will slow down inference stage for some tree based models

2018-07-04 Thread Vincent Wang
Hi Sean, I think the simplest way is to return a *breeze.linalg.HashVector* when *org.apache.spark.ml.linalg.SparseVector#asBreeze* is called, and use a lazy value to store that vector because the construction of *breeze.linalg.HashVector* has some extra performance cost. The code will be like cl