you can try (scala version => you convert to python) val set = initial.groupBy( x => if (x == something) "key1" else "key2")
This would do one pass over original data. On Fri, Nov 28, 2014 at 8:21 AM, mrm <ma...@skimlinks.com> wrote: > Hi, > > My question is: > > I have multiple filter operations where I split my initial rdd into two > different groups. The two groups cover the whole initial set. In code, it's > something like: > > set1 = initial.filter(lambda x: x == something) > set2 = initial.filter(lambda x: x != something) > > By doing this, I am doing two passes over the data. Is there any way to > optimise this to do it in a single pass? > > Note: I was trying to look in the mailing list to see if this question has > been asked already, but could not find it. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/optimize-multiple-filter-operations-tp20010.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >