I have a temporal data set in which I'd like to be able to query using
Spark SQL. The dataset is actually in Accumulo and I've already written a
CatalystScan implementation and RelationProvider[1] to register with the
SQLContext so that I can apply my SQL statements.

With my current implementation, the start and stop time ranges are set on
the RelationProvider (so ultimately they become a per-table setting). I'd
much rather be able to register the table without the time ranges and just
specify them through the SQL query string itself (perhaps a expression in
the WHERE clause?)


[1]
https://github.com/calrissian/accumulo-recipes/blob/master/thirdparty/spark/src/main/scala/org/calrissian/accumulorecipes/spark/sql/EventStoreCatalyst.scala

Reply via email to