Mich, I am curious as well on how Spark casts between different types of filters.
For example: the conversions happen implicitly for 'EqualTo' filter scala> sqlContext.sql("SELECT * from events WHERE `registration` = > '2015-05-28'").explain() > > 16/04/15 11:44:15 INFO ParseDriver: Parsing command: SELECT * from events > WHERE `registration` = '2015-05-28' > > 16/04/15 11:44:15 INFO ParseDriver: Parse Completed > > 16/04/15 11:44:15 INFO SolrRelation: Constructed SolrQuery: > q=*:*&rows=1000&collection=demo&fq=registration:"2015-05-28T07:00:00.000Z" > > == Physical Plan == > > Filter (registration#17 = 1432796400000000) > +- Scan > com.lucidworks.spark.SolrRelation@483f70bf[method#0,author#1,location#2,song#3,timestamp#4,auth#5,page#6,userAgent#7,lastName#8,firstName#9,id#10,itemInSession#11L,artist#12,status#13L,sessionId#14L,length#15,_version_#16L,registration#17,userId#18L,title#19,gender#20,level#21] > PushedFilters: [EqualTo(registration,2015-05-28 00:00:00.0)] whereas for other filters 'GreaterThan', 'GreatherThanOrEqual', 'LessThan', 'LessThanOREqual', the values are casted to String. scala> sqlContext.sql("SELECT * from events WHERE `registration` >= > '2015-05-28 00.00.00'").explain() > 16/04/15 11:45:13 INFO ParseDriver: Parsing command: SELECT * from events > WHERE `registration` >= '2015-05-28 00.00.00' > 16/04/15 11:45:13 INFO ParseDriver: Parse Completed > == Physical Plan == > Filter (cast(registration#17 as string) >= 2015-05-28 00.00.00) > +- Scan com.lucidworks.spark.SolrRelation@483f70bf > [method#0,author#1,location#2,song#3,timestamp#4,auth#5,page#6,userAgent#7,lastName#8,firstName#9,id#10,itemInSession#11L,artist#12,status#13L,sessionId#14L,length#15,_version_#16L,registration#17,userId#18L,title#19,gender#20,level#21] How does Hive work with date/timestamp queries ? Can you share any docs regarding this ? I have found this in Hive docs https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-PartitionBasedQueries . SELECT page_views.* > FROM page_views > WHERE page_views.date >= '2008-03-01' AND page_views.date <= '2008-03-31' In the above SQL query, is there any implicit casting to String or does Hive try to convert it to a Date/Timestamp and then do the comparison ? Thanks, On Fri, Apr 15, 2016 at 1:53 AM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Thanks Takeshi, > > I did check it. I believe you are referring to this statement > > "This is likely because we cast this expression weirdly to be compatible > with Hive. Specifically I think this turns into, CAST(c_date AS STRING) > >= "2016-01-01", and we don't push down casts down into data sources. > > The reason for casting this way is because users currently expect the > following to work c_date >= "2016". " > > There are two issues here: > > 1. The CAST expression is not pushed down > 2. I still don't trust the string comparison of dates. It may or may > not work. I recall this as an issue in Hive > > '2012-11-23' is not a DATE; It is a string. In general from my > experience one should not try to compare a DATE with a string. The results > will depend on several factors, some related to the tool, some related to > the session. > > For example the following will convert a date as string format into a DATE > type in Hive > > > TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(TransactionDate,'dd/MM/yyyy'),'yyyy-MM-dd')) > > HTH > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 15 April 2016 at 06:58, Takeshi Yamamuro <linguin....@gmail.com> wrote: > >> Hi, Mich >> >> Did you check the URL Josh referred to?; >> the cast for string comparisons is needed for accepting `c_date >= >> "2016"`. >> >> // maropu >> >> >> On Fri, Apr 15, 2016 at 10:30 AM, Hyukjin Kwon <gurwls...@gmail.com> >> wrote: >> >>> Hi, >>> >>> >>> String comparison itself is pushed down fine but the problem is to deal >>> with Cast. >>> >>> >>> It was pushed down before but is was reverted, ( >>> https://github.com/apache/spark/pull/8049). >>> >>> Several fixes were tried here, >>> https://github.com/apache/spark/pull/11005 and etc. but there were no >>> changes to make it. >>> >>> >>> To cut it short, it is not being pushed down because it is unsafe to >>> resolve cast (eg. long to integer) >>> >>> For an workaround, the implementation of Solr data source should be >>> changed to one with CatalystScan, which take all the filters. >>> >>> But CatalystScan is not designed to be binary compatible across >>> releases, however it looks some think it is stable now, as mentioned here, >>> https://github.com/apache/spark/pull/10750#issuecomment-175400704. >>> >>> >>> Thanks! >>> >>> >>> 2016-04-15 3:30 GMT+09:00 Mich Talebzadeh <mich.talebza...@gmail.com>: >>> >>>> Hi Josh, >>>> >>>> Can you please clarify whether date comparisons as two strings work at >>>> all? >>>> >>>> I was under the impression is that with string comparison only first >>>> characters are compared? >>>> >>>> Thanks >>>> >>>> Dr Mich Talebzadeh >>>> >>>> >>>> >>>> LinkedIn * >>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>> >>>> >>>> >>>> http://talebzadehmich.wordpress.com >>>> >>>> >>>> >>>> On 14 April 2016 at 19:26, Josh Rosen <joshro...@databricks.com> wrote: >>>> >>>>> AFAIK this is not being pushed down because it involves an implicit >>>>> cast and we currently don't push casts into data sources or scans; see >>>>> https://github.com/databricks/spark-redshift/issues/155 for a >>>>> possibly-related discussion. >>>>> >>>>> On Thu, Apr 14, 2016 at 10:27 AM Mich Talebzadeh < >>>>> mich.talebza...@gmail.com> wrote: >>>>> >>>>>> Are you comparing strings in here or timestamp? >>>>>> >>>>>> Filter ((cast(registration#37 as string) >= 2015-05-28) && >>>>>> (cast(registration#37 as string) <= 2015-05-29)) >>>>>> >>>>>> >>>>>> Dr Mich Talebzadeh >>>>>> >>>>>> >>>>>> >>>>>> LinkedIn * >>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>> >>>>>> >>>>>> >>>>>> http://talebzadehmich.wordpress.com >>>>>> >>>>>> >>>>>> >>>>>> On 14 April 2016 at 18:04, Kiran Chitturi < >>>>>> kiran.chitt...@lucidworks.com> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Timestamp range filter queries in SQL are not getting pushed down to >>>>>>> the PrunedFilteredScan instances. The filtering is happening at the >>>>>>> Spark >>>>>>> layer. >>>>>>> >>>>>>> The physical plan for timestamp range queries is not showing the >>>>>>> pushed filters where as range queries on other types is working fine as >>>>>>> the >>>>>>> physical plan is showing the pushed filters. >>>>>>> >>>>>>> Please see below for code and examples. >>>>>>> >>>>>>> *Example:* >>>>>>> >>>>>>> *1.* Range filter queries on Timestamp types >>>>>>> >>>>>>> *code: * >>>>>>> >>>>>>>> sqlContext.sql("SELECT * from events WHERE `registration` >= >>>>>>>> '2015-05-28' AND `registration` <= '2015-05-29' ") >>>>>>> >>>>>>> *Full example*: >>>>>>> https://github.com/lucidworks/spark-solr/blob/master/src/test/scala/com/lucidworks/spark/EventsimTestSuite.scala#L151 >>>>>>> * plan*: >>>>>>> https://gist.github.com/kiranchitturi/4a52688c9f0abe3d4b2bd8b938044421#file-time-range-sql >>>>>>> >>>>>>> *2. * Range filter queries on Long types >>>>>>> >>>>>>> *code*: >>>>>>> >>>>>>>> sqlContext.sql("SELECT * from events WHERE `length` >= '700' and >>>>>>>> `length` <= '1000'") >>>>>>> >>>>>>> *Full example*: >>>>>>> https://github.com/lucidworks/spark-solr/blob/master/src/test/scala/com/lucidworks/spark/EventsimTestSuite.scala#L151 >>>>>>> *plan*: >>>>>>> https://gist.github.com/kiranchitturi/4a52688c9f0abe3d4b2bd8b938044421#file-length-range-sql >>>>>>> >>>>>>> The SolrRelation class we use extends >>>>>>> <https://github.com/lucidworks/spark-solr/blob/master/src/main/scala/com/lucidworks/spark/SolrRelation.scala#L37> >>>>>>> the PrunedFilteredScan. >>>>>>> >>>>>>> Since Solr supports date ranges, I would like for the timestamp >>>>>>> filters to be pushed down to the Solr query. >>>>>>> >>>>>>> Are there limitations on the type of filters that are passed down >>>>>>> with Timestamp types ? >>>>>>> Is there something that I should do in my code to fix this ? >>>>>>> >>>>>>> Thanks, >>>>>>> -- >>>>>>> Kiran Chitturi >>>>>>> >>>>>>> >>>>>> >>>> >>> >> >> >> -- >> --- >> Takeshi Yamamuro >> > > -- Kiran Chitturi