[
https://issues.apache.org/jira/browse/IMPALA-14737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18060877#comment-18060877
]
ASF subversion and git services commented on IMPALA-14737:
----------------------------------------------------------
Commit 540a3784e0e3cad1a962c368ddbcb14b05de6832 in impala's branch
refs/heads/master from Arnab Karmakar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=540a3784e ]
IMPALA-14737 Part1: Push down LIKE predicates to Iceberg
This patch adds support for pushing down LIKE predicates to Iceberg using
startsWith() and equal() expressions. When a LIKE predicate has a pattern
starting with non-wildcard characters, Impala analyzes the pattern and
pushes it down to Iceberg for efficient file-level filtering.
Supported patterns:
- 'abc%' -> pushes down startsWith('abc')
- 'pre_fix%' -> pushes down startsWith('pre') (underscore is wildcard)
- 'a_b%' -> pushes down startsWith('a')
- 'exact' -> pushes down equal('exact') (no wildcards)
- 'asd\%' -> pushes down equal('asd%') (escaped wildcard treated as literal)
Unsupported patterns (not pushed down):
- '%suffix' - starts with wildcard
- '_prefix%' - starts with wildcard
- 'prefix%suffix' - has literal content after wildcard
Benefits:
- File-level filtering using Iceberg metadata
- Partition pruning when LIKE is on partition columns
- Works with UTF-8 strings
Testing:
- Added iceberg-like-pushdown.test with comprehensive test coverage
- Tests include prefix patterns, underscore wildcards, exact matches,
partition pruning comparison, UTF-8 strings and cases where the pattern
cannot be pushed down
Change-Id: I548834126540bcc8d22efc872c2571293b8b7ec4
Reviewed-on: http://gerrit.cloudera.org:8080/24001
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Pushdown LIKE predicates to Iceberg
> -----------------------------------
>
> Key: IMPALA-14737
> URL: https://issues.apache.org/jira/browse/IMPALA-14737
> Project: IMPALA
> Issue Type: Improvement
> Reporter: Zoltán Borók-Nagy
> Assignee: Arnab Karmakar
> Priority: Major
> Labels: impala-iceberg, ramp-up
>
> Iceberg's
> [https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/expressions/Expressions.java]
> supports more possibilities than what we currently use.
> The most important one is probably
> * startsWith()
> I.e., when we have the following predicate: {{string_col LIKE 'asdf%xyz'}}
> We should push down:
> * startsWith("string_col", "asdf")
> I.e., the non-wildcard prefix of the string.
> It should work for UTF-8 strings as well.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]