RE: pre-filtered hadoop RDD use case

2014-07-29 Thread Yan Zhou.sc
ge- From: Reynold Xin [mailto:r...@databricks.com] Sent: Tuesday, July 29, 2014 11:44 AM To: dev@spark.apache.org Subject: Re: pre-filtered hadoop RDD use case I am not sure if I agree that it lacks the mechanism to do pushdowns. Hadoop InputFormat itself provides some basic mechanism to push

Re: pre-filtered hadoop RDD use case

2014-07-29 Thread Reynold Xin
lementation seems to be in place, and more optimization is desired > beyond just record-oriented execution pipelining. > > > > -Original Message----- > From: Reynold Xin [mailto:r...@databricks.com] > Sent: Tuesday, July 29, 2014 12:55 AM > To: dev@spark.apache.org > Subject: Re: pre-f

RE: pre-filtered hadoop RDD use case

2014-07-29 Thread Yan Zhou.sc
[mailto:r...@databricks.com] Sent: Tuesday, July 29, 2014 12:55 AM To: dev@spark.apache.org Subject: Re: pre-filtered hadoop RDD use case Would something like this help? https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/PartitionPruningRDD.scala On Th

Re: pre-filtered hadoop RDD use case

2014-07-29 Thread Reynold Xin
Would something like this help? https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/PartitionPruningRDD.scala On Thu, Jul 24, 2014 at 8:40 AM, Eugene Cheipesh wrote: > Hello, > > I have an interesting use case for a pre-filtered RDD. I have two solutions > th