[
https://issues.apache.org/jira/browse/HIVE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13683558#comment-13683558
]
Eric Hanson commented on HIVE-4642:
-----------------------------------
This is great that you are learning about regular expression implementation
algorithms. If you can come up with an approach that allows you to compile the
regular expression once into a good internal format when you build the
vectorized FilterStringColRegExpStringScalar class instance, that will be good.
Then you can re-use the internal format (say some kind of FA) for each batch.
Be careful to make sure the common cases are fast. Don't make the project too
big. I am not sure how much time it will take to implement a fully general
regexp matcher. If you think you can do it in the next month or two, fine. If
it takes longer, maybe you should think of a different approach.
If it looks like the project will become too big, consider focusing just on
common special cases (like matching phone numbers, URLS, email addresses,
various number formats, etc.), then use an existing RegExp matcher when the
pattern is not one of the limited class of expressions your new code can handle.
> Implement vectorized RLIKE and REGEXP filter expressions
> --------------------------------------------------------
>
> Key: HIVE-4642
> URL: https://issues.apache.org/jira/browse/HIVE-4642
> Project: Hive
> Issue Type: Sub-task
> Reporter: Eric Hanson
> Assignee: Teddy Choi
>
> See title. I will add more details next week. The goal is (a) make this work
> correctly and (b) optimize it as well as possible, at least for the common
> cases.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira