[ 
https://issues.apache.org/jira/browse/HIVE-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169730#comment-14169730
 ] 

Jimmy Xiang commented on HIVE-7873:
-----------------------------------

I ran the simple perf test in TestHiveKVResultCache.
With lazy disabled, the output I got is:
 5505    4846    4801    4795    5046
The first value is the time in ms to scan 1 million rows. All rows are emitted 
during the close phase.
For the second value, about 512 rows are emitted during each 
processNextRecord() call.
For the third value, about 5120 rows are emitted during each 
processNextRecord() call.
The fourth is similar to the second one, except that about 5% rows is emitted 
in a separate thread.
The fifth is similar to the third one, except that about 5% rows is emitted in 
a separate thread.

Since no lazy execution, all scenarios took about the same time.
With lazy enabled, I got:
4716    2242    5802    2289    5649
We can see for 2 and 4, we have much better performance since the cache can 
hold 512 rows in memory before spilling to disk by default.
1 has about the same performance as no lazy execution.
However, 3 and 5 has worse performance than no lazy execution. My understanding 
is that we don't get the benefit of cache since we need to dump most of the 
rows to disk any way. Somehow, we run into some overhead instead.

> Re-enable lazy HiveBaseFunctionResultList
> -----------------------------------------
>
>                 Key: HIVE-7873
>                 URL: https://issues.apache.org/jira/browse/HIVE-7873
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Brock Noland
>            Assignee: Jimmy Xiang
>              Labels: Spark-M4, spark
>         Attachments: HIVE-7873.1-spark.patch
>
>
> We removed this optimization in HIVE-7799.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to