Given multiple .filter()'s, is there a way to set the order?

YaoPau Fri, 14 Nov 2014 10:22:29 -0800

I have an RDD "x" of millions of STRINGs, each of which I want to pass
through a set of filters.  My filtering code looks like this:


x.filter(filter#1, which will filter out 40% of data).
   filter(filter#2, which will filter out 20% of data).
   filter(filter#3, which will filter out 2% of data).
   filter(filter#4, which will filter out 1% of data)

There is no ordering requirement (filter #2 does not depend on filter #1,
etc), but the filters are drastically different in the % of rows they should
eliminate.  What I'd like is an ordering similar to a "||" statement, where
if it fails on filter#1 the row automatically gets filtered out before the
other three filters run.

But when I play around with the ordering of the filters, the runtime doesn't
seem to change.  Is Spark somehow intelligently guessing how effective each
filter will be and ordering it correctly regardless of how I order them?  If
not, is there I way I can set the filter order?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Given-multiple-filter-s-is-there-a-way-to-set-the-order-tp18957.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Given multiple .filter()'s, is there a way to set the order?

Reply via email to