Running glm in sparkR (data pre-processing step)

Abhishek Anand Mon, 30 May 2016 02:06:56 -0700

Hi ,

I want to run glm variant of sparkR for my data that is there in a csv file.


I see that the glm function in sparkR takes a spark dataframe as input.

Now, when I read a file from csv and create a spark dataframe, how could I
take care of the factor variables/columns in my data ?

Do I need to convert it to a R dataframe, convert to factor using as.factor
and create spark dataframe and run glm over it ?

But, running as.factor over big dataset is not possible.

Please suggest what is the best way to acheive this ?

What pre-processing should be done, and what is the best way to achieve it
 ?


Thanks,
Abhi

Running glm in sparkR (data pre-processing step)

Reply via email to