Hi, Lan,

Generally, it is hard to use existing R packages working with R data frames to 
work with SparkR data frames transparently. Typically the algorithms have to be 
re-written to use SparkR DataFrame API.

Collect is for collecting the data from a SparkR DataFrame into a local 
data.frame. Since a SparkR DataFrame is a distributed data set, typically you 
call methods of SparkR DataFrame API to manipulate its data distributedly and 
after the result is enough to fit in the memory of local machine, you can 
collect it for local processing.

From: Duy Lan Nguyen [mailto:ndla...@gmail.com]
Sent: Wednesday, December 23, 2015 5:50 AM
To: user@spark.apache.org
Subject: Do existing R packages work with SparkR data frames

Hello,

Is it possible for existing R Machine Learning packages (which work with R data 
frames) such as bnlearn, to work with SparkR data frames? Or do I need to 
convert SparkR data frames to R data frames? Is "collect" the function to do 
the conversion, or how else to do that?

Many Thanks,
Lan

Reply via email to