Thanks for broadcasting! Just have a few questions to better understand the 
awesome work.

Could you give a little more details on the score and error columns? Does error 
mean every time the query hits a null?
Shall I assume 5k/10k means the number of rows? What do we learn from compare 
to IcebergSourceFlatParquetDataReadBenchmark.readIceberg? Or rather, what 
numbers are we comparing to?

-Li

发件人: Anjali Norwood <anorw...@netflix.com>
答复: "dev@iceberg.apache.org" <dev@iceberg.apache.org>
日期: 2019年8月10日 星期六 上午4:47
收件人: Ryan Blue <rb...@netflix.com>, "dev@iceberg.apache.org" 
<dev@iceberg.apache.org>
抄送: Gautam <gautamkows...@gmail.com>, "ppa...@apache.org" <ppa...@apache.org>, 
Samarth Jain <sj...@netflix.com>, Daniel Weeks <dwe...@netflix.com>
主题: Re: Encouraging performance results for Vectorized Iceberg code(Internet 
mail)

Good suggestion Ryan. Added dev@iceberg now.

Dev: Please see early vectorized Iceberg performance results a couple emails 
down. This WIP.

thanks,
Anjali.

On Thu, Aug 8, 2019 at 10:39 AM Ryan Blue 
<rb...@netflix.com<mailto:rb...@netflix.com>> wrote:
Hi everyone,

Is it possible to copy the Iceberg dev list when sending these emails? There 
are other people in the community that are interested, like Palantir. If there 
isn't anything sensitive then let's try to be more inclusive. Thanks!

rb

On Wed, Aug 7, 2019 at 10:34 PM Anjali Norwood 
<anorw...@netflix.com<mailto:anorw...@netflix.com>> wrote:
Hi Gautam, Padma,
We wanted to update you before Gautam takes off for vacation.

Samarth and I profiled the code and found the following:
Profiling the IcebergSourceFlatParquetDataReadBenchmark (10 files, 10M rows, a 
single long column) using visualVM shows two places where CPU time can be 
optimized:
1) Iterator abstractions (triple iterators, page iterators etc) seem to take up 
quite a bit of time. Not using these iterators or making them 'batched' 
iterators and moving the reading of the data close to the file should help 
ameliorate this problem.
2) Current code goes back and forth between definition levels and value reads 
through the levels of iterators. Quite a bit of CPU time is spent here. Reading 
a batch of primitive values at once after consulting the definition level 
should help improve performance.

So, we prototyped the code to walk over the definition levels and read 
corresponding values in batches (read values till we hit a null, then read 
nulls till we hit values and so on) and made the iterators batched iterators. 
Here are the results:

Benchmark                                                              Mode  
Cnt   Score   Error  Units
IcebergSourceFlatParquetDataReadBenchmark.readFileSourceNonVectorized    ss    
5  10.247 ± 0.202   s/op
IcebergSourceFlatParquetDataReadBenchmark.readFileSourceVectorized       ss    
5   3.747 ± 0.206   s/op
IcebergSourceFlatParquetDataReadBenchmark.readIceberg                          
ss     5  11.286 ± 0.457   s/op
IcebergSourceFlatParquetDataReadBenchmark.readIcebergVectorized100k      ss    
5   6.088 ± 0.324   s/op
IcebergSourceFlatParquetDataReadBenchmark.readIcebergVectorized10k       ss    
5   5.875 ± 0.378   s/op
IcebergSourceFlatParquetDataReadBenchmark.readIcebergVectorized1k        ss    
5   6.029 ± 0.387   s/op
IcebergSourceFlatParquetDataReadBenchmark.readIcebergVectorized5k        ss    
5   6.106 ± 0.497   s/op


Moreover, as I mentioned to Gautam on chat, we prototyped reading the string 
column as a byte array without decoding it into UTF8 (above changes were not 
made at the time) and we saw significant performance improvements there (21.18 
secs before Vs 13.031 secs with the change). When used along with batched 
iterators, these numbers should get better.

Note that we haven't tightened/profiled the new code yet (we will start on that 
next). Just wanted to share some early positive results.

regards,
Anjali.



--
Ryan Blue
Software Engineer
Netflix

Reply via email to