[VOTE] Release Apache Iceberg 0.13.0 RC1

2022-01-25 Thread Jack Ye
Hi Everyone, I propose that we release the following RC as the official Apache Iceberg 0.13.0 release. The commit ID is ca8bb7d0821f35bbcfa79a39841be8fb630ac3e5 * This corresponds to the tag: apache-iceberg-0.13.0-rc1 * https://github.com/apache/iceberg/commits/apache-iceberg-0.13.0-rc1 * https:/

Continuing the Secondary Index Discussion

2022-01-25 Thread Jack Ye
Hi everyone, Based on the conversation in the last community sync and the Iceberg Slack channel, it seems like multiple parties have interest in continuing the effort related to the secondary index in Iceberg, so I would like to restart the thread to continue the discussion. So far most people re

Re: [VOTE] Release Apache Iceberg 0.13.0 RC1

2022-01-25 Thread Kyle Bendickson
Thank you, Jack! Quick announcement when testing: *the runtime jars / artifacts for Spark & Flink have changed naming format *to include the corresponding Spark / Flink version. The Spark jars also have the Scala version appended at the end. *Spark:* You can test the 0.13.0-rc1, fetching it from

Re: Continuing the Secondary Index Discussion

2022-01-25 Thread Ryan Blue
Thanks for raising this for discussion, Jack! It would be great to start adding more indexes. > Scope of native index support The way I think about it, the biggest challenge here is how to know when you can use an index. For example, if you have a partition index that is up to date as of snapshot

Re: Continuing the Secondary Index Discussion

2022-01-25 Thread Miao Wang
Thanks Jack for resuming the discussion. Zaicheng from Byte Dance created a slack channel for index work. I suggested him adding Anton and you to the channel. I still remember some conclusions from previous discussions. 1). Index types support: We planned to support Skipping Index first. Iceber

Re: Continuing the Secondary Index Discussion

2022-01-25 Thread Zaicheng Wang
Thanks for having the thread. This is Zaicheng from bytedance. Initially we are planning to add index feature for our internal Trino and feel like iceberg could be the best place for holding/buiding the index data. We are very interested in having and contributing to this feature. (Pretty new to t

Re: Continuing the Secondary Index Discussion

2022-01-25 Thread Jack Ye
Thanks for the fast responses! Based on the conversations above, it sounds like we have the following consensus: 1. asynchronous index creation is preferred, although synchronous index creation is possible. 2. a mechanism for tracking file change is needed. Unfortunately sequence number cannot be