Re: How to Map and Reduce in sparkR

Wei Zhou Thu, 25 Jun 2015 16:28:21 -0700

Thanks Shivaram. For those who prefer to watch the video version for the
talk, like me, you can actually register for spark summit live stream 2015
free of cost. I personally find the talk extremely helpful.


2015-06-25 15:20 GMT-07:00 Shivaram Venkataraman <[email protected]
>:

> We don't support UDFs on DataFrames in SparkR in the 1.4 release. The
> existing functionality can be seen as a pre-processing step which you can
> do and then collect data back to the driver to do more complex processing.
> Along with the RDD API ticket, we are also working on UDF support. You can
> see the Spark summit talk slides from last week for a bigger picture view
> http://www.slideshare.net/SparkSummit/07-venkataraman-sun
>
> Thanks
> Shivaram
>
> On Thu, Jun 25, 2015 at 3:08 PM, Wei Zhou <[email protected]> wrote:
>
>> Hi Shivaram/Alek,
>>
>> I understand that a better way to import data is to DataFrame rather than
>> RDD. If one wants to do a map-like transformation for such row in sparkR,
>> one could use sparkR:::lapply(), but is there a counterpart row operation
>> on DataFrame? The use case I am working on requires complicated row level
>> pre-processing and then goes to the actually modeling.
>>
>> Thanks.
>>
>> Best,
>> Wei
>>
>> 2015-06-25 9:25 GMT-07:00 Shivaram Venkataraman <
>> [email protected]>:
>>
>>> In addition to Aleksander's point please let us know what use case would
>>> use RDD-like API in https://issues.apache.org/jira/browse/SPARK-7264 --
>>> We are hoping to have a version of this API in upcoming releases.
>>>
>>> Thanks
>>> Shivaram
>>>
>>> On Thu, Jun 25, 2015 at 6:02 AM, Eskilson,Aleksander <
>>> [email protected]> wrote:
>>>
>>>>  The  simple answer is that SparkR does support map/reduce operations
>>>> over RDD’s through the RDD API, but since Spark v 1.4.0, those functions
>>>> were made private in SparkR. They can still be accessed by prepending the
>>>> function with the namespace, like SparkR:::lapply(rdd, func). It was
>>>> thought though that many of the functions in the RDD API were too low level
>>>> to expose, with much more of the focus going into the DataFrame API. The
>>>> original rationale for this decision can be found in its JIRA [1]. The devs
>>>> are still deciding which functions of the RDD API, if any, should be made
>>>> public for future releases. If you feel some use cases are most easily
>>>> handled in SparkR through RDD functions, go ahead and let the dev email
>>>> list know.
>>>>
>>>>  Alek
>>>> [1] -- https://issues.apache.org/jira/browse/SPARK-7230
>>>>
>>>>   From: Wei Zhou <[email protected]>
>>>> Date: Wednesday, June 24, 2015 at 4:59 PM
>>>> To: "[email protected]" <[email protected]>
>>>> Subject: How to Map and Reduce in sparkR
>>>>
>>>>   Anyone knows whether sparkR supports map and reduce operations as
>>>> the RDD transformations? Thanks in advance.
>>>>
>>>>  Best,
>>>> Wei
>>>>    CONFIDENTIALITY NOTICE This message and any included attachments
>>>> are from Cerner Corporation and are intended only for the addressee. The
>>>> information contained in this message is confidential and may constitute
>>>> inside or non-public information under international, federal, or state
>>>> securities laws. Unauthorized forwarding, printing, copying, distribution,
>>>> or use of such information is strictly prohibited and may be unlawful. If
>>>> you are not the addressee, please promptly delete this message and notify
>>>> the sender of the delivery error by e-mail or you may call Cerner's
>>>> corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024
>>>> .
>>>>
>>>
>>>
>>
>

Re: How to Map and Reduce in sparkR

Reply via email to