OpenInx, thanks a lot for kicking off the discussion. Looks like my previous reply didn't reach the mailing list.
> flink source based on the new FLIP-27 interface Yes, we shall target 0.11.0 release for the FLIP-27 flink source. I have updated the issue [1] with the following scopes. - Support both static/batch and continuous/streaming enumeration modes - Support only the simple assigner with no ordering/locality guarantee when handing out split assignment. But make the interface flexible to plug in different assigners (like the event time alignment assigner or locality aware assigner) - It will be @Experimenta status as nobody has run FLIP-27 sources in production today. Flink 1.12.0 release (ETA end of Nov) will have the first set of sources (Kafka and file) implemented with FLIP-27 source framework. We still need to gain more production experiences. [1] https://github.com/apache/iceberg/issues/1626 On Wed, Oct 28, 2020 at 12:15 AM OpenInx <open...@gmail.com> wrote: > Hi dev > > As we know, we will be happy to cut the iceberg 0.10.0 candidate release > this week. I think it may be the time to plan for the future iceberg > 0.11.0 now, so I created a Java 0.11.0 Release milestone here [1] > > I put the following issues into the newly created milestone: > > 1. Apache Flink Rewrite Actions in Apache Iceberg. > > It's possible that we encounter too many small files issues when running > the iceberg flink sink in real production because of the frequent > checkpoint. we have two approaches to handle the small files: > > a. As the current spark rewrite actions designed, flink will provide the > similar rewrite actions which will be running in a batch job. It's > suitable to trigger the whole table or whole partitions compactions > periodically, because this kind of rewrites will compact many large files > and may consume lots of bandwidth. Currently, I and JunZheng are working > on this issue, and we've extracted the base rewrite actions between spark > module and flink module. The next step would be implementing rewrite > actions in the flink module. > > b. Compact those small files in the flink streaming job when sinking into > iceberg tables. That means we will provide a new rewrite operator chaining > to the current IcebergFilesCommitter. Once an iceberg transaction has been > committed, the newly introduced rewrite operator will check whether it > needs a small compaction. Those actions only choose a few tiny size files > (may be several KB, or MB, I think we could provide a configurable > threshold) to rewrite, which can be achieved with a minimum cost and a > higher efficiency of compaction. Currently, simonsssu from Tencent has > provided a WIP PR here [2] > > > 2. Allow to write CDC or UPSERT records by flink streaming jobs. > > We've almost implemented the row-level delete feature in the iceberg > master branch, but still lack the ability to integrate with compute engines > (to be precise, we spark/flink could read the expected records if someone > has deleted the rows correctly but the write path is not available). I am > preparing the patch for sinking CDC into iceberg by flink streaming job > here [3], I think it will be ready in the next few weeks. > > 3. Apache flink streaming reader. > > We've prepared a POC version in our alibaba internal branch, but still not > contribute to apache iceberg now. I think it's worth accomplishing that in > the following days. > > > The above are the issues that I think it's worth to merge before iceberg > 0.11.0. But I' not quite sure what's the plan for the things: > > 1. I know @Anton Okolnychyi <aokolnyc...@apple.com> is working on > spark-sql extensions for iceberg, I guess there's a high probability to get > that ? [4] > > 2. @Steven Wu <steve...@netflix.com> from netflix is working on flink > source based on the new FLIP-27 interface, thoughts ? [5] > > 3. How about the Spark Row-Delete integration work ? > > > > [1]. https://github.com/apache/iceberg/milestone/12 > [2]. https://github.com/apache/iceberg/pull/1669/files > [3]. https://github.com/apache/iceberg/pull/1663 > [4]. https://github.com/apache/iceberg/milestone/11 > [5]. https://github.com/apache/iceberg/issues/1626 >