Hi
SparkR has some support for machine learning algorithm like glm.
For existing R packages, currently you would need to collect to convert into R
data.frame - assuming it fits into the memory of the driver node, though that
would be required to work with R package in any case.
___
Hi, Lan,
Generally, it is hard to use existing R packages working with R data frames to
work with SparkR data frames transparently. Typically the algorithms have to be
re-written to use SparkR DataFrame API.
Collect is for collecting the data from a SparkR DataFrame into a local
data.frame. Si