Hi Corey, I would not recommend using the CatalystScan for this. Its lower level, and not stable across releases.
You should be able to do what you want with PrunedFilteredScan <https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala#L155>, though. The filters <https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/sources/filters.scala> that it pushes down are already normalized so you can easily look for range predicates with the start/end columns you care about. val start = filters.find { case GreaterThan("start", startDate: String) => DateTime.parse(startDate).toDate }.getOrElse(<min possible start date>) val end = filters.find { case LessThan("end", endDate: String) => DateTime.parse(endDate).toDate }.getOrElse(<max possible date>) ... Filters are advisory, so you can ignore ones that aren't start/end. Michael On Thu, Feb 12, 2015 at 8:32 PM, Corey Nolet <cjno...@gmail.com> wrote: > I have a temporal data set in which I'd like to be able to query using > Spark SQL. The dataset is actually in Accumulo and I've already written a > CatalystScan implementation and RelationProvider[1] to register with the > SQLContext so that I can apply my SQL statements. > > With my current implementation, the start and stop time ranges are set on > the RelationProvider (so ultimately they become a per-table setting). I'd > much rather be able to register the table without the time ranges and just > specify them through the SQL query string itself (perhaps a expression in > the WHERE clause?) > > > [1] > https://github.com/calrissian/accumulo-recipes/blob/master/thirdparty/spark/src/main/scala/org/calrissian/accumulorecipes/spark/sql/EventStoreCatalyst.scala >