Re: Mllib Logistic Regression performance relative to Mahout

2016-03-02 Thread raj.kumar
Thanks Yashwanth, Our features are a mixture of categoric and numeric features. I convert categoric-features into numeric-features with the standard techniques such as one-hot encoding. -Raj -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Mllib-Logistic-R

Dataset Encoders for SparseVector

2016-02-04 Thread raj.kumar
Hi, I have a DataFrame df with a column "feature" of type SparseVector that results from the ml library's VectorAssembler class. I'd like to get a Dataset of SparseVectors from this column, but when I do a df.as[SparseVector] scala complains that it doesn't know of an encoder for SparseVector

Saving and Loading Dataframes

2016-02-25 Thread raj.kumar
Hi, I am using mllib. I use the ml vectorization tools to create the vectorized input dataframe for the ml/mllib machine-learning models with schema: root |-- label: double (nullable = true) |-- features: vector (nullable = true) To avoid repeated vectorization, I am trying to save and load t

Mllib Logistic Regression performance relative to Mahout

2016-02-26 Thread raj.kumar
Hi, We are trying to port over some code that uses Mahout Logistic Regression to Mllib Logistic Regression and our preliminary performance tests indicate a performance bottleneck. It is not clear to me if this is due to one of three factors: o Comparing apples to oranges o Inadequate tuning o Ins