[ https://issues.apache.org/jira/browse/HIVE-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169730#comment-14169730 ]
Jimmy Xiang commented on HIVE-7873: ----------------------------------- I ran the simple perf test in TestHiveKVResultCache. With lazy disabled, the output I got is: 5505 4846 4801 4795 5046 The first value is the time in ms to scan 1 million rows. All rows are emitted during the close phase. For the second value, about 512 rows are emitted during each processNextRecord() call. For the third value, about 5120 rows are emitted during each processNextRecord() call. The fourth is similar to the second one, except that about 5% rows is emitted in a separate thread. The fifth is similar to the third one, except that about 5% rows is emitted in a separate thread. Since no lazy execution, all scenarios took about the same time. With lazy enabled, I got: 4716 2242 5802 2289 5649 We can see for 2 and 4, we have much better performance since the cache can hold 512 rows in memory before spilling to disk by default. 1 has about the same performance as no lazy execution. However, 3 and 5 has worse performance than no lazy execution. My understanding is that we don't get the benefit of cache since we need to dump most of the rows to disk any way. Somehow, we run into some overhead instead. > Re-enable lazy HiveBaseFunctionResultList > ----------------------------------------- > > Key: HIVE-7873 > URL: https://issues.apache.org/jira/browse/HIVE-7873 > Project: Hive > Issue Type: Sub-task > Reporter: Brock Noland > Assignee: Jimmy Xiang > Labels: Spark-M4, spark > Attachments: HIVE-7873.1-spark.patch > > > We removed this optimization in HIVE-7799. -- This message was sent by Atlassian JIRA (v6.3.4#6332)