Thanks! That's exactly what I was hoping for! Thanks for finding the Jira
for me!

On Tue, Aug 4, 2020 at 11:46 AM Wenchen Fan <cloud0...@gmail.com> wrote:

> I think this is not a problem in 3.0 anymore, see
> https://issues.apache.org/jira/browse/SPARK-27638
>
> On Wed, Aug 5, 2020 at 12:08 AM Russell Spitzer <russell.spit...@gmail.com>
> wrote:
>
>> I've just run into this issue again with another user and I feel like
>> most folks here have seen some flavor of this at some point.
>>
>> The user registers a Datasource with a column of type Date (or some non
>> string) then performs a query that looks like.
>>
>> *SELECT * from Source WHERE date_col > '2020-08-03'*
>>
>> Seeing that the predicate literal here is a String, Spark needs to make a
>> change so that the DataSource column will be of the same type (Date),
>> so it places a "Cast" on the Datasource column so our plan ends up
>> looking like.
>>
>> Cast(date_col as String) > '2020-08-03'
>>
>> Since the Datasource Strategies can't handle a push down of the "Cast"
>> function we lose the predicate pushdown we could
>> have had. This can change a Job from a single partition lookup into a
>> full scan leading to a very confusing situation for
>> the end user. I also wonder about the relative cost here since we could
>> be avoiding doing X casts and instead just do a single
>> one on the predicate, in addition we could be doing the cast at the
>> Analysis phase and cut the run short before any work even
>> starts rather than doing a perhaps meaningless comparison between a date
>> and a non-date string.
>>
>> I think we should seriously consider whether in cases like this we should
>> attempt to cast the literal rather than casting the
>> source column.
>>
>> Please let me know if anyone has thoughts on this, or has some previous
>> Jiras I could dig into if it's been discussed before,
>> Russ
>>
>

Reply via email to