Re: Seeking Suggestions on Implementing NaN Counters for Metrics

2020-10-19 Thread Ryan Blue
Hi Yan, I think you’re correct about how everything works right now. Because the ORC and Parquet writers already keep statistics, Iceberg uses those instead of keeping its own. And that means that Avro doesn’t yet have stats implemented. There’s a great start to adding stats for Avro files in PR

Seeking Suggestions on Implementing NaN Counters for Metrics

2020-10-16 Thread Yan Yan
Hi Iceberg community, I'm from Amazon and very new to the space, so please bear with me for any naive questions. I'm currently looking into adding NaN counts for float and double columns (described in #348 ). I noticed that metrics like upper/lower boun