Thanks! That's exactly what I was hoping for! Thanks for finding the Jira for me!
On Tue, Aug 4, 2020 at 11:46 AM Wenchen Fan <cloud0...@gmail.com> wrote: > I think this is not a problem in 3.0 anymore, see > https://issues.apache.org/jira/browse/SPARK-27638 > > On Wed, Aug 5, 2020 at 12:08 AM Russell Spitzer <russell.spit...@gmail.com> > wrote: > >> I've just run into this issue again with another user and I feel like >> most folks here have seen some flavor of this at some point. >> >> The user registers a Datasource with a column of type Date (or some non >> string) then performs a query that looks like. >> >> *SELECT * from Source WHERE date_col > '2020-08-03'* >> >> Seeing that the predicate literal here is a String, Spark needs to make a >> change so that the DataSource column will be of the same type (Date), >> so it places a "Cast" on the Datasource column so our plan ends up >> looking like. >> >> Cast(date_col as String) > '2020-08-03' >> >> Since the Datasource Strategies can't handle a push down of the "Cast" >> function we lose the predicate pushdown we could >> have had. This can change a Job from a single partition lookup into a >> full scan leading to a very confusing situation for >> the end user. I also wonder about the relative cost here since we could >> be avoiding doing X casts and instead just do a single >> one on the predicate, in addition we could be doing the cast at the >> Analysis phase and cut the run short before any work even >> starts rather than doing a perhaps meaningless comparison between a date >> and a non-date string. >> >> I think we should seriously consider whether in cases like this we should >> attempt to cast the literal rather than casting the >> source column. >> >> Please let me know if anyone has thoughts on this, or has some previous >> Jiras I could dig into if it's been discussed before, >> Russ >> >