sc.parallelize(model.weights.toArray, blocks).top(k) will get that right ? For logistic you might want both positive and negative feature...so just pass it through a filter on abs and then pick top(k)
On Thu, Sep 18, 2014 at 10:30 AM, Sameer Tilak <ssti...@live.com> wrote: > Hi All, > > I am able to run LinearRegressionWithSGD on a small sample dataset (~60MB > Libsvm file of sparse data) with 6700 features. > > val model = LinearRegressionWithSGD.train(examples, numIterations) > > At the end I get a model that > > model.weights.size > res6: Int = 6699 > > I am assuming each entry in the model is weight for the corresponding > feature/index. However,, if I want to get the top10 most important > features or all features with weights higher than certain threshold, is > that functionality available out-of-box? I can implement that on my own, > but seems like a common feature that most of the people will need when they > are working on high-dimensional dataset. > > > >