I have an RDD "x" of millions of STRINGs, each of which I want to pass through a set of filters. My filtering code looks like this:
x.filter(filter#1, which will filter out 40% of data). filter(filter#2, which will filter out 20% of data). filter(filter#3, which will filter out 2% of data). filter(filter#4, which will filter out 1% of data) There is no ordering requirement (filter #2 does not depend on filter #1, etc), but the filters are drastically different in the % of rows they should eliminate. What I'd like is an ordering similar to a "||" statement, where if it fails on filter#1 the row automatically gets filtered out before the other three filters run. But when I play around with the ordering of the filters, the runtime doesn't seem to change. Is Spark somehow intelligently guessing how effective each filter will be and ordering it correctly regardless of how I order them? If not, is there I way I can set the filter order? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Given-multiple-filter-s-is-there-a-way-to-set-the-order-tp18957.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org