> On Feb. 9, 2015, 2:51 a.m., Rui Li wrote: > > > > Rui Li wrote: > Some high level question, do we still need two buffers? And does it make > sense to use something like a queue instead of an array as the buffer? > > Jimmy Xiang wrote: > Queue should work too. Using too buffers makes it easier to switch > between read and write. Switching itself is cheap here. For RowContainer, it > is expensive to switch because of first()/clear(), etc.
Thanks for the explanation Jimmy. I was just wondering if we can use a single queue as the buffer and avoid switching between two arrays and managing the cusors. That should make it less complicated right? > On Feb. 9, 2015, 2:51 a.m., Rui Li wrote: > > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveKVResultCache.java, > > line 54 > > <https://reviews.apache.org/r/30739/diff/4/?file=853475#file853475line54> > > > > If I understand correctly, this can be renamed to something like > > IN_MEMORY_NUM_ROWS? > > Jimmy Xiang wrote: > Yes, you are right. Both are ok. Any strong reason for renaming it? No, I just feel cache size is more like some size in bytes. > On Feb. 9, 2015, 2:51 a.m., Rui Li wrote: > > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveKVResultCache.java, > > line 236 > > <https://reviews.apache.org/r/30739/diff/4/?file=853475#file853475line236> > > > > I suppose this is to avoid frequent switch buffer? But why the magic > > number 1? > > Jimmy Xiang wrote: > Right. If it is 1, there is no need to switch buffer. For other number, > we need to switch anyway. I assume there are many scenarios that there is > just one row. I see thanks. - Rui ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30739/#review71597 ----------------------------------------------------------- On Feb. 9, 2015, 7:41 p.m., Jimmy Xiang wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/30739/ > ----------------------------------------------------------- > > (Updated Feb. 9, 2015, 7:41 p.m.) > > > Review request for hive, Rui Li and Xuefu Zhang. > > > Bugs: HIVE-9574 > https://issues.apache.org/jira/browse/HIVE-9574 > > > Repository: hive-git > > > Description > ------- > > Result KV cache doesn't use RowContainer any more since it has logic we don't > need, which is some overhead. We don't do lazy computing right away, instead > we wait a little till the cache is close to spill. > > > Diffs > ----- > > > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java > 78ab680 > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveKVResultCache.java > 8ead0cb > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunction.java > 7a09b4d > > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunctionResultList.java > e92e299 > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java > 070ea4d > > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java > d4ff37c > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/KryoSerializer.java > 286816b > ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestHiveKVResultCache.java > 0df4598 > > Diff: https://reviews.apache.org/r/30739/diff/ > > > Testing > ------- > > Unit test, test on cluster > > > Thanks, > > Jimmy Xiang > >