[ 
https://issues.apache.org/jira/browse/HUDI-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3594:
-----------------------------
    Sprint: Hudi-Sprint-Mar-07, Hudi-Sprint-Mar-14, Hudi-Sprint-Mar-21  (was: 
Hudi-Sprint-Mar-07, Hudi-Sprint-Mar-14)

> Support standard Spark functions in Filter Exprs in Data Skipping
> -----------------------------------------------------------------
>
>                 Key: HUDI-3594
>                 URL: https://issues.apache.org/jira/browse/HUDI-3594
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: Alexey Kudinkin
>            Assignee: Alexey Kudinkin
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 0.11.0
>
>
> As part of this effort we're planning to (at the very least) support a suite 
> of standard Spark functions when evaluating Data Filtering expressions w/in 
> Data Skipping flow, for ex: when user is issuing a following query 
>  
> {code:java}
> SELECT ... WHERE date_format(ts, 'dd-mm-yyyy') > '01-01-2022'
> {code}
> We're able to relate such query to our Column Stats Index appropriately, 
> therefore being able to do Data Skipping not only on the "raw" columns, but 
> also upon simple derivative expressions on top of them (like standard 
> function calls){*}{{*}}
>  
> *Important to note here, is that only transformations that _preserve the 
> ordering of the source column_ can be applied. Transformations not preserving 
> the ordering will render Column Stats index practically irrelevant (since no 
> assumption could be made that values in the column derived by such 
> transformations are ordered)*



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to