You definitely don't want to implement kmeans in R, since it would be very slow. Just providing R wrappers for the MLlib implementation is the way to go. I believe one of the major items in SparkR next is the MLlib wrappers.
On Tue, May 26, 2015 at 7:46 AM, Andrew Psaltis <psaltis.and...@gmail.com> wrote: > Hi Alek, > Thanks for the info. You are correct ,that using the three colons does > work. Admittedly I am a R novice, but since the three colons is used to > access hidden methods, it seems pretty dirty. > > Can someone shed light on the design direction being taken with SparkR? > Should I really be accessing hidden methods or will better approach > prevail? For instance, it feels like the k-means sample should really use > MLlib and not just be a port the k-means sample using hidden methods. Am I > looking at this incorrectly? > > Thanks, > Andrew > > On Tue, May 26, 2015 at 6:56 AM, Eskilson,Aleksander < > alek.eskil...@cerner.com> wrote: > >> From the changes to the namespace file, that appears to be correct, all >> methods of the RDD API have been made private, which in R means that you >> may still access them by using the namespace prefix SparkR with three >> colons, e.g. SparkR:::func(foo, bar). >> >> So a starting place for porting old SparkR scripts from before the >> merge could be to identify those methods in the script belonging to the RDD >> class and be sure they have the namespace identifier tacked on the front. I >> hope that helps. >> >> Regards, >> Alek Eskilson >> >> From: Andrew Psaltis <psaltis.and...@gmail.com> >> Date: Monday, May 25, 2015 at 6:25 PM >> To: "dev@spark.apache.org" <dev@spark.apache.org> >> Subject: SparkR and RDDs >> >> Hi, >> I understand from SPARK-6799[1] and the respective merge commit [2] that >> the RDD class is private in Spark 1.4 . If I wanted to modify the old >> Kmeans and/or LR examples so that the computation happened in Spark what is >> the best direction to go? Sorry if I am missing something obvious, but >> based on the NAMESPACE file [3] in the SparkR codebase I am having trouble >> seeing the obvious direction to go. >> >> Thanks in advance, >> Andrew >> >> [1] https://issues.apache.org/jira/browse/SPARK-6799 >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SPARK-2D6799&d=AwMFaQ&c=NRtzTzKNaCCmhN_9N2YJR-XrNU1huIgYP99yDsEzaJo&r=0vZw1rBdgaYvDJYLyKglbrax9kvQfRPdzxLUyWSyxPM&m=T9sfWUgCtxLUJ9F4B-MAmBhrH4e3aGvb_hbrENoIKho&s=bawjeA3Y9me3xXGxKghL4_dlf7vHdFHtiV5IhMlOmtc&e=> >> [2] >> https://github.com/apache/spark/commit/4b91e18d9b7803dbfe1e1cf20b46163d8cb8716c >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_spark_commit_4b91e18d9b7803dbfe1e1cf20b46163d8cb8716c&d=AwMFaQ&c=NRtzTzKNaCCmhN_9N2YJR-XrNU1huIgYP99yDsEzaJo&r=0vZw1rBdgaYvDJYLyKglbrax9kvQfRPdzxLUyWSyxPM&m=T9sfWUgCtxLUJ9F4B-MAmBhrH4e3aGvb_hbrENoIKho&s=Hc7ijtxcnrZ7wSOStlz0-BHH-rUXSFowCpJuNGYu5eo&e=> >> [3] https://github.com/apache/spark/blob/branch-1.4/R/pkg/NAMESPACE >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_spark_blob_branch-2D1.4_R_pkg_NAMESPACE&d=AwMFaQ&c=NRtzTzKNaCCmhN_9N2YJR-XrNU1huIgYP99yDsEzaJo&r=0vZw1rBdgaYvDJYLyKglbrax9kvQfRPdzxLUyWSyxPM&m=T9sfWUgCtxLUJ9F4B-MAmBhrH4e3aGvb_hbrENoIKho&s=l64LUOvbJ53qsVYphkYJ7_kbNptBdEhsSRSWBg5zqn8&e=> >> >> CONFIDENTIALITY NOTICE This message and any included attachments are >> from Cerner Corporation and are intended only for the addressee. The >> information contained in this message is confidential and may constitute >> inside or non-public information under international, federal, or state >> securities laws. Unauthorized forwarding, printing, copying, distribution, >> or use of such information is strictly prohibited and may be unlawful. If >> you are not the addressee, please promptly delete this message and notify >> the sender of the delivery error by e-mail or you may call Cerner's >> corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024. >> > >