Several flink pull requests need to get merged before the next release 0.10.0

2020-10-19 Thread OpenInx
Hi As we know that we next release 0.10.0 is coming, there are several issues which should be merged as soon as possible in my mind: 1. https://github.com/apache/iceberg/pull/1477 It will change the flink state design to maintain the complete data files into manifest before checkpoint finished,

Re: Incremental reads for Upsert!

2020-10-19 Thread OpenInx
Yeah, we have discussed the incremental readers server times, here is the conclusion [1]. I also wrote a document to show the thoughts behind the discussion[2], you might be interested in it. In my opinion, the next release 0.10.0 will include the basic flink sink connector and batch reader. an

Re: Incremental reads for Upsert!

2020-10-19 Thread Ryan Blue
Hi Ashish, We've discussed this use case quite a bit, but I don't think that there are currently any readers that expose the deletes as a stream. Right now, all of the readers produce records from the current tables state. I think @OpenInx and @Jingsong Li have some plans to expose such a reader

Re: Seeking Suggestions on Implementing NaN Counters for Metrics

2020-10-19 Thread Ryan Blue
Hi Yan, I think you’re correct about how everything works right now. Because the ORC and Parquet writers already keep statistics, Iceberg uses those instead of keeping its own. And that means that Avro doesn’t yet have stats implemented. There’s a great start to adding stats for Avro files in PR