If you can examine your data matrix and know that about < 1/6 or so of the
values are non-zero (so > 5/6 are zeros), then it's probably worth using
sparse vectors.  (1/6 is a rough estimate.)

There is support for L1 and L2 regularization.  You can look at the guide
here:
http://spark.apache.org/docs/latest/mllib-linear-methods.html#logistic-regression
and the API docs linked from the menu.

On Fri, Apr 3, 2015 at 1:24 PM, Jeetendra Gangele <gangele...@gmail.com>
wrote:

> Hi All
> I am building a logistic regression for matching the person data lets say
> two person object is given with their attribute we need to find the score.
> that means at side you have 10 millions records and other side we have 1
> record , we need to tell which one match with highest score among 1 million.
>
> I am strong the score of similarity algos in dense matrix and considering
> this as features. will apply many similarity alogs on one attributes.
>
> Should i use sparse or dense? what happen in dense when score is null or
> when some of the attribute is missing?
>
> is there any support for regularized logistic regression ?currently i am
> using LogisticRegressionWithSGD.
>
> Regards
> jeetendra
>

Reply via email to