You can combine map and filter in one operation using
collect(PartialFunction)  [1]

val cleanData = rawData.collect{case x  if (condition(x)) f(x) }

[1] **Not to be confused with the parameterless rdd.collect() that triggers
computations and delivers the results to the driver! **

PS: use the [email protected] for this kind of API usage discussion.
dev is mainly to discuss Spark internals.

On Fri, Nov 14, 2014 at 4:38 PM, Ganelin, Ilya <[email protected]>
wrote:

> Hi Quizhuang - you have two options:
> 1) Within the map step define a validation function that will be executed
> on every record.
> 2) Use the filter function to create a filtered dataset prior to
> processing.
>
> On 11/14/14, 10:28 AM, "Qiuzhuang Lian" <[email protected]> wrote:
>
> >Hi,
> >
> >MapReduce has the feature of skipping bad records. Is there any equivalent
> >in Spark? Should I use filter API to do this?
> >
> >Thanks,
> >Qiuzhuang
>
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed.  If the reader of this message is not the
> intended recipient, you are hereby notified that any review,
> retransmission, dissemination, distribution, copying or other use of, or
> taking of any action in reliance upon this information is strictly
> prohibited. If you have received this communication in error, please
> contact the sender and delete the material from your computer.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to