Re: Encouraging performance results for Vectorized Iceberg code(Internet mail)

2019-08-12 Thread Anjali Norwood
Hi Padma, Gautam, All, Our (Samarth's and mine) wip vectorized code is here: https://github.com/anjalinorwood/incubator-iceberg/pull/1. Dan, can you please merge it to 'vectorized-read' branch when you get a chance? Thanks! regards, Anjali. On Mon, Aug 12, 2019 at 10:49 AM Ryan Blue wrote:

InclusiveMetricsEvaluator null counts

2019-08-12 Thread Anton Okolnychyi
Hey folks, A quick question: InclusiveMetricsEvaluator doesn't check null counts in qt/gtEq/lt/ltEq predicates as opposed to ParquetMetricsRowGroupFilter, meaning files containing only null values will always match. Am I correct that it's on purpose and we expect query engines to always infer i

Re: Encouraging performance results for Vectorized Iceberg code(Internet mail)

2019-08-12 Thread Ryan Blue
Li, You're right that the 10k and similar numbers indicate the batch size. Scores can be interpreted using the "units" column at the end. In this case, seconds per operation, so lower is better. Error is the measurement error. This indicates confidence that the actual rate of execution is, for e

Re: Two newbie question about Iceberg

2019-08-12 Thread Ryan Blue
Great, thanks for working on this, Saisai! On Thu, Aug 8, 2019 at 7:38 PM Saisai Shao wrote: > I'm still looking into this, to figure out a way to add HIVE_LOCKS table > in the Spark side. Anyway I will create an issue first to track this. > > Best regards, > Saisai > > Ryan Blue 于2019年8月9日周五 上