Hi, Lan, Generally, it is hard to use existing R packages working with R data frames to work with SparkR data frames transparently. Typically the algorithms have to be re-written to use SparkR DataFrame API.
Collect is for collecting the data from a SparkR DataFrame into a local data.frame. Since a SparkR DataFrame is a distributed data set, typically you call methods of SparkR DataFrame API to manipulate its data distributedly and after the result is enough to fit in the memory of local machine, you can collect it for local processing. From: Duy Lan Nguyen [mailto:ndla...@gmail.com] Sent: Wednesday, December 23, 2015 5:50 AM To: user@spark.apache.org Subject: Do existing R packages work with SparkR data frames Hello, Is it possible for existing R Machine Learning packages (which work with R data frames) such as bnlearn, to work with SparkR data frames? Or do I need to convert SparkR data frames to R data frames? Is "collect" the function to do the conversion, or how else to do that? Many Thanks, Lan