[ 
https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157994#comment-16157994
 ] 

Ke Jia commented on HIVE-17139:
-------------------------------

Upload the latest patch to fix the failed tests and the remain  failed tests 
seem not patch related.
I test the patch with  table product_reviews of TPCx-BB using the following sql 
statement:
{code:java}
select case when pr_review_rating=4 then upper(pr_review_content)  when 
pr_review_rating=3 then upper(pr_review_content) end from product_reviews;
{code}
The cluster includes 8 nodes, 230G/per node. CPU is Intel(R) Xeon(R) CPU 
E5-2699.
With 3TB data scale and spark as executor engine, the following is the result:
|| ||without patch||with patch||improvement(s)||improvement(%)||
|Hos|28.25s|16.14s|12.11s|42.8%|
|VectorSelectOperator |2.99s|12.58s|9.59s|76.2%|
The result shows the execution time of spark from 28.25s to 16.14s and the time 
cost of VectorSelectOperator from 12.58s to 2.99s.
Here, the total records, "pr_review_rating=4" records and "pr_review_rating=3" 
records are as following:
|| ||count||
|total records|9934636|
|pr_review_rating=4 records|1897804|
|pr_review_rating=3 records|792278|
With this patch, only (1897804+792278) records do the upper operation of the 
above sql statement and without this patch, there are (9934636+9934636) records 
doing the upper operation.

> Conditional expressions optimization: skip the expression evaluation if the 
> condition is not satisfied for vectorization engine.
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-17139
>                 URL: https://issues.apache.org/jira/browse/HIVE-17139
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Ke Jia
>            Assignee: Ke Jia
>         Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, 
> HIVE-17139.3.patch, HIVE-17139.4.patch, HIVE-17139.5.patch, 
> HIVE-17139.6.patch, HIVE-17139.7.patch, HIVE-17139.8.patch
>
>
> The case when and if statement execution for Hive vectorization is not 
> optimal, which all the conditional and else expressions are evaluated for 
> current implementation. The optimized approach is to update the selected 
> array of batch parameter after the conditional expression is executed. Then 
> the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to