Thank you Yanbo, It looks like this is available in 1.6 version only. Can you tell me how/when can I download version 1.6?
Thanks and Regards, Vishnu Viswanath, On Wed, Dec 2, 2015 at 4:37 AM, Yanbo Liang <yblia...@gmail.com> wrote: > You can set "handleInvalid" to "skip" which help you skip the labels which > not exist in training dataset. > > 2015-12-02 14:31 GMT+08:00 Vishnu Viswanath <vishnu.viswanat...@gmail.com> > : > >> Hi Jeff, >> >> I went through the link you provided and I could understand how the fit() >> and transform() work. >> I tried to use the pipeline in my code and I am getting exception Caused >> by: org.apache.spark.SparkException: Unseen label: >> >> The reason for this error as per my understanding is: >> For the column on which I am doing StringIndexing, the test data is >> having values which was not there in train data. >> Since fit() is done only on the train data, the indexing is failing. >> >> Can you suggest me what can be done in this situation. >> >> Thanks, >> >> On Mon, Nov 30, 2015 at 12:32 AM, Vishnu Viswanath < >> vishnu.viswanat...@gmail.com> wrote: >> >> Thank you Jeff. >>> >>> On Sun, Nov 29, 2015 at 7:36 PM, Jeff Zhang <zjf...@gmail.com> wrote: >>> >>>> StringIndexer is an estimator which would train a model to be used both >>>> in training & prediction. So it is consistent between training & >>>> prediction. >>>> >>>> You may want to read this section of spark ml doc >>>> http://spark.apache.org/docs/latest/ml-guide.html#how-it-works >>>> >>>> >>>> >>>> On Mon, Nov 30, 2015 at 12:52 AM, Vishnu Viswanath < >>>> vishnu.viswanat...@gmail.com> wrote: >>>> >>>>> Thanks for the reply Yanbo. >>>>> >>>>> I understand that the model will be trained using the indexer map >>>>> created during the training stage. >>>>> >>>>> But since I am getting a new set of data during prediction, and I have >>>>> to do StringIndexing on the new data also, >>>>> Right now I am using a new StringIndexer for this purpose, or is there >>>>> any way that I can reuse the Indexer used for training stage. >>>>> >>>>> Note: I am having a pipeline with StringIndexer in it, and I am >>>>> fitting my train data in it and building the model. Then later when i get >>>>> the new data for prediction, I am using the same pipeline to fit the data >>>>> again and do the prediction. >>>>> >>>>> Thanks and Regards, >>>>> Vishnu Viswanath >>>>> >>>>> >>>>> On Sun, Nov 29, 2015 at 8:14 AM, Yanbo Liang <yblia...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi Vishnu, >>>>>> >>>>>> The string and indexer map is generated at model training step and >>>>>> used at model prediction step. >>>>>> It means that the string and indexer map will not changed when >>>>>> prediction. You will use the original trained model when you do >>>>>> prediction. >>>>>> >>>>>> 2015-11-29 4:33 GMT+08:00 Vishnu Viswanath < >>>>>> vishnu.viswanat...@gmail.com>: >>>>>> > Hi All, >>>>>> > >>>>>> > I have a general question on using StringIndexer. >>>>>> > StringIndexer gives an index to each label in the feature starting >>>>>> from 0 ( >>>>>> > 0 for least frequent word). >>>>>> > >>>>>> > Suppose I am building a model, and I use StringIndexer for >>>>>> transforming on >>>>>> > of my column. >>>>>> > e.g., suppose A was most frequent word followed by B and C. >>>>>> > >>>>>> > So the StringIndexer will generate >>>>>> > >>>>>> > A 0.0 >>>>>> > B 1.0 >>>>>> > C 2.0 >>>>>> > >>>>>> > After building the model, I am going to do some prediction using >>>>>> this model, >>>>>> > So I do the same transformation on my new data which I need to >>>>>> predict. And >>>>>> > suppose the new dataset has C as the most frequent word, followed >>>>>> by B and >>>>>> > A. So the StringIndexer will assign index as >>>>>> > >>>>>> > C 0.0 >>>>>> > B 1.0 >>>>>> > A 2.0 >>>>>> > >>>>>> > These indexes are different from what we used for modeling. So >>>>>> won’t this >>>>>> > give me a wrong prediction if I use StringIndexer? >>>>>> > >>>>>> > >>>>>> >>>>> >>>>> >>>> -- >>>> Best Regards >>>> >>>> Jeff Zhang >>>> >>> >>> >>> >>> >> > >