Thanks Shivaram. For those who prefer to watch the video version for the talk, like me, you can actually register for spark summit live stream 2015 free of cost. I personally find the talk extremely helpful.
2015-06-25 15:20 GMT-07:00 Shivaram Venkataraman <[email protected] >: > We don't support UDFs on DataFrames in SparkR in the 1.4 release. The > existing functionality can be seen as a pre-processing step which you can > do and then collect data back to the driver to do more complex processing. > Along with the RDD API ticket, we are also working on UDF support. You can > see the Spark summit talk slides from last week for a bigger picture view > http://www.slideshare.net/SparkSummit/07-venkataraman-sun > > Thanks > Shivaram > > On Thu, Jun 25, 2015 at 3:08 PM, Wei Zhou <[email protected]> wrote: > >> Hi Shivaram/Alek, >> >> I understand that a better way to import data is to DataFrame rather than >> RDD. If one wants to do a map-like transformation for such row in sparkR, >> one could use sparkR:::lapply(), but is there a counterpart row operation >> on DataFrame? The use case I am working on requires complicated row level >> pre-processing and then goes to the actually modeling. >> >> Thanks. >> >> Best, >> Wei >> >> 2015-06-25 9:25 GMT-07:00 Shivaram Venkataraman < >> [email protected]>: >> >>> In addition to Aleksander's point please let us know what use case would >>> use RDD-like API in https://issues.apache.org/jira/browse/SPARK-7264 -- >>> We are hoping to have a version of this API in upcoming releases. >>> >>> Thanks >>> Shivaram >>> >>> On Thu, Jun 25, 2015 at 6:02 AM, Eskilson,Aleksander < >>> [email protected]> wrote: >>> >>>> The simple answer is that SparkR does support map/reduce operations >>>> over RDD’s through the RDD API, but since Spark v 1.4.0, those functions >>>> were made private in SparkR. They can still be accessed by prepending the >>>> function with the namespace, like SparkR:::lapply(rdd, func). It was >>>> thought though that many of the functions in the RDD API were too low level >>>> to expose, with much more of the focus going into the DataFrame API. The >>>> original rationale for this decision can be found in its JIRA [1]. The devs >>>> are still deciding which functions of the RDD API, if any, should be made >>>> public for future releases. If you feel some use cases are most easily >>>> handled in SparkR through RDD functions, go ahead and let the dev email >>>> list know. >>>> >>>> Alek >>>> [1] -- https://issues.apache.org/jira/browse/SPARK-7230 >>>> >>>> From: Wei Zhou <[email protected]> >>>> Date: Wednesday, June 24, 2015 at 4:59 PM >>>> To: "[email protected]" <[email protected]> >>>> Subject: How to Map and Reduce in sparkR >>>> >>>> Anyone knows whether sparkR supports map and reduce operations as >>>> the RDD transformations? Thanks in advance. >>>> >>>> Best, >>>> Wei >>>> CONFIDENTIALITY NOTICE This message and any included attachments >>>> are from Cerner Corporation and are intended only for the addressee. The >>>> information contained in this message is confidential and may constitute >>>> inside or non-public information under international, federal, or state >>>> securities laws. Unauthorized forwarding, printing, copying, distribution, >>>> or use of such information is strictly prohibited and may be unlawful. If >>>> you are not the addressee, please promptly delete this message and notify >>>> the sender of the delivery error by e-mail or you may call Cerner's >>>> corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024 >>>> . >>>> >>> >>> >> >
