you can try (scala version => you convert to python)

val set = initial.groupBy( x => if (x == something) "key1" else "key2")

This would do one pass over original data.

On Fri, Nov 28, 2014 at 8:21 AM, mrm <ma...@skimlinks.com> wrote:

> Hi,
>
> My question is:
>
> I have multiple filter operations where I split my initial rdd into two
> different groups. The two groups cover the whole initial set. In code, it's
> something like:
>
> set1 = initial.filter(lambda x: x == something)
> set2 = initial.filter(lambda x: x != something)
>
> By doing this, I am doing two passes over the data. Is there any way to
> optimise this to do it in a single pass?
>
> Note: I was trying to look in the mailing list to see if this question has
> been asked already, but could not find it.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/optimize-multiple-filter-operations-tp20010.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to