You can combine map and filter in one operation using
collect(PartialFunction) [1]
val cleanData = rawData.collect{case x if (condition(x)) f(x) }
[1] **Not to be confused with the parameterless rdd.collect() that triggers
computations and delivers the results to the driver! **
PS: use the [email protected] for this kind of API usage discussion.
dev is mainly to discuss Spark internals.
On Fri, Nov 14, 2014 at 4:38 PM, Ganelin, Ilya <[email protected]>
wrote:
> Hi Quizhuang - you have two options:
> 1) Within the map step define a validation function that will be executed
> on every record.
> 2) Use the filter function to create a filtered dataset prior to
> processing.
>
> On 11/14/14, 10:28 AM, "Qiuzhuang Lian" <[email protected]> wrote:
>
> >Hi,
> >
> >MapReduce has the feature of skipping bad records. Is there any equivalent
> >in Spark? Should I use filter API to do this?
> >
> >Thanks,
> >Qiuzhuang
>
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the
> intended recipient, you are hereby notified that any review,
> retransmission, dissemination, distribution, copying or other use of, or
> taking of any action in reliance upon this information is strictly
> prohibited. If you have received this communication in error, please
> contact the sender and delete the material from your computer.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>