[ 
https://issues.apache.org/jira/browse/HIVE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13674690#comment-13674690
 ] 

Eric Hanson commented on HIVE-4642:
-----------------------------------

I think this sounds good except that using multi-threaded parallelism is not a 
good idea here. We should rely on getting parallelism for large data sets by 
having multiple splits processed in parallel in different processes. Using 
file-grain multi-threaded parallelism within a process only for purposes of 
speeding up RLIKE/REGEXP does not see appropriate. I'd recommend focusing on 
the fastest operation you can get within a single thread, at least for common 
patterns, or maybe even all possible patterns.
                
> Implement vectorized RLIKE and REGEXP filter expressions
> --------------------------------------------------------
>
>                 Key: HIVE-4642
>                 URL: https://issues.apache.org/jira/browse/HIVE-4642
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Eric Hanson
>            Assignee: Teddy Choi
>
> See title. I will add more details next week. The goal is (a) make this work 
> correctly and (b) optimize it as well as possible, at least for the common 
> cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to