Hi Guillaume,

This is always an option. However, I read about HashingTF which exactly
does this quite efficiently and can scale too. Hence, looking for a
solution using this technique.


regards
Bala


On 5 November 2015 at 18:50, tog <guillaume.all...@gmail.com> wrote:

> Hi Bala
>
> Can't you do a simple dictionnary and map those values to numbers?
>
> Cheers
> Guillaume
>
> On 5 November 2015 at 09:54, Balachandar R.A. <balachandar...@gmail.com>
> wrote:
>
>> HI
>>
>>
>> I am new to spark MLlib and machine learning. I have a csv file that
>> consists of around 100 thousand rows and 20 columns. Of these 20 columns,
>> 10 contains string values. Each value in these columns are not necessarily
>> unique. They are kind of categorical, that is, the values could be one
>> amount, say 10 values. To start with, I could run examples, especially,
>> random forest algorithm in my local spark (1.5.1.) platform. However, I
>> have a challenge with my dataset due to these strings as the APIs takes
>> numerical values. Can any one tell me how I can map these categorical
>> values (strings) into numbers and use them with random forest algorithms?
>> Any example will be greatly appreciated.
>>
>>
>> regards
>>
>> Bala
>>
>
>
>
> --
> PGP KeyID: 2048R/EA31CFC9  subkeys.pgp.net
>

Reply via email to