Re: Review Request 30739: HIVE-9574 Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]

Rui Li Mon, 09 Feb 2015 17:41:40 -0800


> On Feb. 9, 2015, 2:51 a.m., Rui Li wrote:
> >
> 
> Rui Li wrote:
>     Some high level question, do we still need two buffers? And does it make 
> sense to use something like a queue instead of an array as the buffer?
> 
> Jimmy Xiang wrote:
>     Queue should work too. Using too buffers makes it easier to switch 
> between read and write. Switching itself is cheap here. For RowContainer, it 
> is expensive to switch because of first()/clear(), etc.


Thanks for the explanation Jimmy. I was just wondering if we can use a single 
queue as the buffer and avoid switching between two arrays and managing the 
cusors. That should make it less complicated right?


> On Feb. 9, 2015, 2:51 a.m., Rui Li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveKVResultCache.java, 
> > line 54
> > <https://reviews.apache.org/r/30739/diff/4/?file=853475#file853475line54>
> >
> >     If I understand correctly, this can be renamed to something like 
> > IN_MEMORY_NUM_ROWS?
> 
> Jimmy Xiang wrote:
>     Yes, you are right. Both are ok. Any strong reason for renaming it?

No, I just feel cache size is more like some size in bytes.


> On Feb. 9, 2015, 2:51 a.m., Rui Li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveKVResultCache.java, 
> > line 236
> > <https://reviews.apache.org/r/30739/diff/4/?file=853475#file853475line236>
> >
> >     I suppose this is to avoid frequent switch buffer? But why the magic 
> > number 1?
> 
> Jimmy Xiang wrote:
>     Right. If it is 1, there is no need to switch buffer. For other number, 
> we need to switch anyway. I assume there are many scenarios that there is 
> just one row.

I see thanks.


- Rui


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30739/#review71597
-----------------------------------------------------------


On Feb. 9, 2015, 7:41 p.m., Jimmy Xiang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30739/
> -----------------------------------------------------------
> 
> (Updated Feb. 9, 2015, 7:41 p.m.)
> 
> 
> Review request for hive, Rui Li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9574
>     https://issues.apache.org/jira/browse/HIVE-9574
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Result KV cache doesn't use RowContainer any more since it has logic we don't 
> need, which is some overhead. We don't do lazy computing right away, instead 
> we wait a little till the cache is close to spill.
> 
> 
> Diffs
> -----
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java
>  78ab680 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveKVResultCache.java 
> 8ead0cb 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunction.java 
> 7a09b4d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunctionResultList.java
>  e92e299 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
> 070ea4d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java
>  d4ff37c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/KryoSerializer.java 
> 286816b 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestHiveKVResultCache.java 
> 0df4598 
> 
> Diff: https://reviews.apache.org/r/30739/diff/
> 
> 
> Testing
> -------
> 
> Unit test, test on cluster
> 
> 
> Thanks,
> 
> Jimmy Xiang
> 
>

Re: Review Request 30739: HIVE-9574 Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]

Reply via email to