IE: If my JDBC table has an index on it, will the optimizer consider that when pushing predicates down?
I noticed in a query like this: df = spark.hiveContext.read.jdbc( url=jdbc_url, table="schema.table", column="id", lowerBound=lower_bound_id, upperBound=upper_bound_id, numPartitions=numberPartitions ) df.registerTempTable("df") filtered_df = spark.hiveContext.sql(""" SELECT * FROM df WHERE type = 'type' AND action = 'action' AND audited_changes LIKE '---\ncompany_id:\n- %' """) filtered_audits.registerTempTable("filtered_df") The queries sent to the DB look like this: "Select fields from schema.table where type='type' and action='action' and id > lower_bound and id <= upper_bound" And then it does the like ( LIKE '---\ncompany_id:\n- %') in memory, which is great! However I'm wondering why it chooses that optimization. In this case there aren't any indexes on any of these except ID. So, does spark take into account JDBC indexes in it's query plan where it can? Thanks! Gary Lucas