I have a temporal data set in which I'd like to be able to query using Spark SQL. The dataset is actually in Accumulo and I've already written a CatalystScan implementation and RelationProvider[1] to register with the SQLContext so that I can apply my SQL statements.
With my current implementation, the start and stop time ranges are set on the RelationProvider (so ultimately they become a per-table setting). I'd much rather be able to register the table without the time ranges and just specify them through the SQL query string itself (perhaps a expression in the WHERE clause?) [1] https://github.com/calrissian/accumulo-recipes/blob/master/thirdparty/spark/src/main/scala/org/calrissian/accumulorecipes/spark/sql/EventStoreCatalyst.scala