Hi folks, not sure if this belongs to dev or user list..sending to dev as
it seems a bit convoluted.

I have a UI in which we allow users to write ad-hoc queries against a (very
large, partitioned) table. I would like to analyze the queries prior to
execution for two purposes:

1. Reject under-constrained queries (i.e. there is a field predicate that I
want to make sure is always present)
2. Augment the query with additional predicates (e.g if the user asks for a
student_id I also want to push a constraint on another field)

I could parse the sql string before passing to spark but obviously spark
already does this anyway. Can someone give me general direction on how to
do this (if possible).

Something like

myDF = sql("user_sql_query")
myDF.queryExecution.logical  //here examine the filters provided by user,
reject if underconstrained, push new filters as needed (via
withNewChildren?)

at this point with some luck I'd have a new LogicalPlan -- what is the
proper way to create an execution plan on top of this new Plan? Im looking
at this
https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L329
but this method is restricted to the package. I'd really prefer to hook
into this as early as possible and still let spark run the plan
optimizations as usual.

Any guidance or pointers much appreciated.

Reply via email to