Re: General question on using StringIndexer in SparkML

2015-12-02 Thread Vishnu Viswanath
Thank you. On Wed, Dec 2, 2015 at 8:12 PM, Yanbo Liang wrote: > You can get 1.6.0-RC1 from > http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-rc1-bin/ > currently, but it's not the last release version. > > 2015-12-02 23:57 GMT+08:00 Vishnu Viswanath > : > >> Thank you Yanbo, >> >

Re: General question on using StringIndexer in SparkML

2015-12-02 Thread Yanbo Liang
You can get 1.6.0-RC1 from http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-rc1-bin/ currently, but it's not the last release version. 2015-12-02 23:57 GMT+08:00 Vishnu Viswanath : > Thank you Yanbo, > > It looks like this is available in 1.6 version only. > Can you tell me how/when

Re: General question on using StringIndexer in SparkML

2015-12-02 Thread Vishnu Viswanath
Thank you Yanbo, It looks like this is available in 1.6 version only. Can you tell me how/when can I download version 1.6? Thanks and Regards, Vishnu Viswanath, On Wed, Dec 2, 2015 at 4:37 AM, Yanbo Liang wrote: > You can set "handleInvalid" to "skip" which help you skip the labels which > not

Re: General question on using StringIndexer in SparkML

2015-12-02 Thread Yanbo Liang
You can set "handleInvalid" to "skip" which help you skip the labels which not exist in training dataset. 2015-12-02 14:31 GMT+08:00 Vishnu Viswanath : > Hi Jeff, > > I went through the link you provided and I could understand how the fit() > and transform() work. > I tried to use the pipeline in

Re: General question on using StringIndexer in SparkML

2015-12-01 Thread Vishnu Viswanath
Hi Jeff, I went through the link you provided and I could understand how the fit() and transform() work. I tried to use the pipeline in my code and I am getting exception Caused by: org.apache.spark.SparkException: Unseen label: The reason for this error as per my understanding is: For the colum

Re: General question on using StringIndexer in SparkML

2015-11-29 Thread Vishnu Viswanath
Thank you Jeff. On Sun, Nov 29, 2015 at 7:36 PM, Jeff Zhang wrote: > StringIndexer is an estimator which would train a model to be used both in > training & prediction. So it is consistent between training & prediction. > > You may want to read this section of spark ml doc > http://spark.apache.

Re: General question on using StringIndexer in SparkML

2015-11-29 Thread Jeff Zhang
StringIndexer is an estimator which would train a model to be used both in training & prediction. So it is consistent between training & prediction. You may want to read this section of spark ml doc http://spark.apache.org/docs/latest/ml-guide.html#how-it-works On Mon, Nov 30, 2015 at 12:52 AM,

Re: General question on using StringIndexer in SparkML

2015-11-29 Thread Vishnu Viswanath
Thanks for the reply Yanbo. I understand that the model will be trained using the indexer map created during the training stage. But since I am getting a new set of data during prediction, and I have to do StringIndexing on the new data also, Right now I am using a new StringIndexer for this purp

Re: General question on using StringIndexer in SparkML

2015-11-29 Thread Yanbo Liang
Hi Vishnu, The string and indexer map is generated at model training step and used at model prediction step. It means that the string and indexer map will not changed when prediction. You will use the original trained model when you do prediction. 2015-11-29 4:33 GMT+08:00 Vishnu Viswanath : > Hi

General question on using StringIndexer in SparkML

2015-11-28 Thread Vishnu Viswanath
Hi All, I have a general question on using StringIndexer. StringIndexer gives an index to each label in the feature starting from 0 ( 0 for least frequent word). Suppose I am building a model, and I use StringIndexer for transforming on of my column. e.g., suppose A was most frequent word followe