[ https://issues.apache.org/jira/browse/HIVE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13683558#comment-13683558 ]
Eric Hanson commented on HIVE-4642: ----------------------------------- This is great that you are learning about regular expression implementation algorithms. If you can come up with an approach that allows you to compile the regular expression once into a good internal format when you build the vectorized FilterStringColRegExpStringScalar class instance, that will be good. Then you can re-use the internal format (say some kind of FA) for each batch. Be careful to make sure the common cases are fast. Don't make the project too big. I am not sure how much time it will take to implement a fully general regexp matcher. If you think you can do it in the next month or two, fine. If it takes longer, maybe you should think of a different approach. If it looks like the project will become too big, consider focusing just on common special cases (like matching phone numbers, URLS, email addresses, various number formats, etc.), then use an existing RegExp matcher when the pattern is not one of the limited class of expressions your new code can handle. > Implement vectorized RLIKE and REGEXP filter expressions > -------------------------------------------------------- > > Key: HIVE-4642 > URL: https://issues.apache.org/jira/browse/HIVE-4642 > Project: Hive > Issue Type: Sub-task > Reporter: Eric Hanson > Assignee: Teddy Choi > > See title. I will add more details next week. The goal is (a) make this work > correctly and (b) optimize it as well as possible, at least for the common > cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira