If you corpus is large (nlp) this is indeed the best solution otherwise
(few words I.e. Categories) I guess you will end up with the same result
On Friday, 6 November 2015, Balachandar R.A.
wrote:
> Hi Guillaume,
>
>
> This is always an option. However, I read about HashingTF which exactly
> do
Hi Guillaume,
This is always an option. However, I read about HashingTF which exactly
does this quite efficiently and can scale too. Hence, looking for a
solution using this technique.
regards
Bala
On 5 November 2015 at 18:50, tog wrote:
> Hi Bala
>
> Can't you do a simple dictionnary and m
Hi Bala
Can't you do a simple dictionnary and map those values to numbers?
Cheers
Guillaume
On 5 November 2015 at 09:54, Balachandar R.A.
wrote:
> HI
>
>
> I am new to spark MLlib and machine learning. I have a csv file that
> consists of around 100 thousand rows and 20 columns. Of these 20 co