HI

I am new to spark MLlib and machine learning. I have a csv file that
consists of around 100 thousand rows and 20 columns. Of these 20 columns,
10 contains string values. Each value in these columns are not necessarily
unique. They are kind of categorical, that is, the values could be one
amount, say 10 values. To start with, I could run examples, especially,
random forest algorithm in my local spark (1.5.1.) platform. However, I
have a challenge with my dataset due to these strings as the APIs takes
numerical values. Can any one tell me how I can map these categorical
values (strings) into numbers and use them with random forest algorithms?
Any example will be greatly appreciated.


regards

Bala

Reply via email to