You definitely don't want to implement kmeans in R, since it would be very
slow. Just providing R wrappers for the MLlib implementation is the way to
go. I believe one of the major items in SparkR next is the MLlib wrappers.



On Tue, May 26, 2015 at 7:46 AM, Andrew Psaltis <psaltis.and...@gmail.com>
wrote:

> Hi Alek,
> Thanks for the info. You are correct ,that using the three colons does
> work. Admittedly I am a R novice, but since the three colons is used to
> access hidden methods, it seems pretty dirty.
>
> Can someone shed light on the design direction being taken with SparkR?
> Should I really be accessing hidden methods or will better approach
> prevail? For instance, it feels like the k-means sample should really use
> MLlib and not just be a port the k-means sample using hidden methods. Am I
> looking at this incorrectly?
>
> Thanks,
> Andrew
>
> On Tue, May 26, 2015 at 6:56 AM, Eskilson,Aleksander <
> alek.eskil...@cerner.com> wrote:
>
>>  From the changes to the namespace file, that appears to be correct, all
>> methods of the RDD API have been made private, which in R means that you
>> may still access them by using the namespace prefix SparkR with three
>> colons, e.g. SparkR:::func(foo, bar).
>>
>>  So a starting place for porting old SparkR scripts from before the
>> merge could be to identify those methods in the script belonging to the RDD
>> class and be sure they have the namespace identifier tacked on the front. I
>> hope that helps.
>>
>>  Regards,
>> Alek Eskilson
>>
>>   From: Andrew Psaltis <psaltis.and...@gmail.com>
>> Date: Monday, May 25, 2015 at 6:25 PM
>> To: "dev@spark.apache.org" <dev@spark.apache.org>
>> Subject: SparkR and RDDs
>>
>>   Hi,
>> I understand from SPARK-6799[1] and the respective merge commit [2]  that
>> the RDD class is private in Spark 1.4 . If I wanted to modify the old
>> Kmeans and/or LR examples so that the computation happened in Spark what is
>> the best direction to go? Sorry if I am missing something obvious, but
>> based on the NAMESPACE file [3] in the SparkR codebase I am having trouble
>> seeing the obvious direction to go.
>>
>>  Thanks in advance,
>> Andrew
>>
>>  [1] https://issues.apache.org/jira/browse/SPARK-6799
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SPARK-2D6799&d=AwMFaQ&c=NRtzTzKNaCCmhN_9N2YJR-XrNU1huIgYP99yDsEzaJo&r=0vZw1rBdgaYvDJYLyKglbrax9kvQfRPdzxLUyWSyxPM&m=T9sfWUgCtxLUJ9F4B-MAmBhrH4e3aGvb_hbrENoIKho&s=bawjeA3Y9me3xXGxKghL4_dlf7vHdFHtiV5IhMlOmtc&e=>
>> [2]
>> https://github.com/apache/spark/commit/4b91e18d9b7803dbfe1e1cf20b46163d8cb8716c
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_spark_commit_4b91e18d9b7803dbfe1e1cf20b46163d8cb8716c&d=AwMFaQ&c=NRtzTzKNaCCmhN_9N2YJR-XrNU1huIgYP99yDsEzaJo&r=0vZw1rBdgaYvDJYLyKglbrax9kvQfRPdzxLUyWSyxPM&m=T9sfWUgCtxLUJ9F4B-MAmBhrH4e3aGvb_hbrENoIKho&s=Hc7ijtxcnrZ7wSOStlz0-BHH-rUXSFowCpJuNGYu5eo&e=>
>> [3] https://github.com/apache/spark/blob/branch-1.4/R/pkg/NAMESPACE
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_spark_blob_branch-2D1.4_R_pkg_NAMESPACE&d=AwMFaQ&c=NRtzTzKNaCCmhN_9N2YJR-XrNU1huIgYP99yDsEzaJo&r=0vZw1rBdgaYvDJYLyKglbrax9kvQfRPdzxLUyWSyxPM&m=T9sfWUgCtxLUJ9F4B-MAmBhrH4e3aGvb_hbrENoIKho&s=l64LUOvbJ53qsVYphkYJ7_kbNptBdEhsSRSWBg5zqn8&e=>
>>
>>    CONFIDENTIALITY NOTICE This message and any included attachments are
>> from Cerner Corporation and are intended only for the addressee. The
>> information contained in this message is confidential and may constitute
>> inside or non-public information under international, federal, or state
>> securities laws. Unauthorized forwarding, printing, copying, distribution,
>> or use of such information is strictly prohibited and may be unlawful. If
>> you are not the addressee, please promptly delete this message and notify
>> the sender of the delivery error by e-mail or you may call Cerner's
>> corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
>>
>
>

Reply via email to