Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-06-17 Thread Xiaoxuan Li
Hi Haizhou, Thanks for sharing the use cases! For use case 1, I think Peter mentioned that the wide table feature might help with backfilling and updating a single column. However, keeping both base data and rank data in the same table doesn’t seem low-maintenance, especially the rankings are reca

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-06-13 Thread Péter Váry
Hi Haizhou Zhao, Thanks for the detailed use-cases! Quick question for the Use Case 1: Do you have a primary key to identify the updated rows? How wide, complicated is this key? Is the ranking always recalculated for the full partition? If so the discussion around the wide tables (https://lists.a

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-06-12 Thread Haizhou Zhao
Hey Xiaoxuan, Want to bring up a couple use cases that might be related to your proposal, but you can tell me whether they are relevant or not from your perspective. Use Case 1: Ranking A user has a table with N+1 columns, the first N columns are the base data (events she gathered), the last col

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-06-10 Thread Xiaoxuan Li
Thank you for the thoughtful feedback, Yan, and for bringing up these important questions. > How realistic is the scenario I've described, and what's the likelihood of encountering it in production environments? I don’t have direct visibility into that either, but I’ve seen some vendors claim the

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-06-06 Thread Yan Yan
Thanks Xiaoxuan for the detailed proposal and everyone for the great discussion! It seems to me that it feels more valuable if we can firstly clearly define the specific use case we're trying to address, as this would help us make more informed decisions about trade-offs between file vs partitione

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-06-04 Thread Steven Wu
Haizhou, 1. it is probably inaccurate to call Parquet a table format provider. Parquet is a just file format. Delete vectors (position deletes) are outside the scope of Parquet files. The nature of equality deletes just make it impossible to read in constant time O(1) 2. The inverted index idea i

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-06-04 Thread Haizhou Zhao
Hey folks, Thanks for discussing this interesting topic. I have couple relevant thoughts while reading through this thread: 1. Is this an Iceberg issue, or a Parquet (table format provider) issue? For example, if Parquet (or other table format provider) provides a mechanism where both query by po

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-06-04 Thread Xiaoxuan Li
Totally agree, supporting mutability on top of immutable storage at scale is a non-trivial problem. I think the number of index files is ok, we can preload them in parallel or cache them on disk. Not sure yet about caching deserialized data, that might need some more thought. Xiaoxuan On Wed, Jun

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-06-04 Thread Péter Váry
> Our primary strategy for accelerating lookups focuses on optimizing the index file itself, leveraging sorted keys, smaller row group sizes, and Bloom filters for Parquet files. We’re also exploring custom formats that support more fine-grained skipping. The techniques you mentioned are important

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-06-03 Thread Xiaoxuan Li
Hi Peter, > If the table is partitioned and sorted by the PK, we don't really need to have any index. We can find the data file containing the record based on the Content File statistics, and the RowGroup containing the record based on the Parquet metadata. Our primary strategy for accelerating l

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-06-03 Thread Péter Váry
Hi Xiaoxuan, > 2. File-Level Indexing > [..] > To make this efficient, the table should be partitioned and sorted by the PK. If the table is partitioned and sorted by the PK, we don't really need to have any index. We can find the data file containing the record based on the Content File statisti

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-06-02 Thread Xiaoxuan Li
Thanks, Peter, for bringing these ideas forward! and you also raised a great point about clarifying the goal of indexing. I’ve been considering it with the intention of eventually enabling fast upserts through DVs. To support that, we need an index that maps primary keys to both the data file and t

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-05-30 Thread Péter Váry
Hi Xiaoxuan, I hope you had a good time on your time off! Thanks for your detailed response. I think it would help if we focused on the specific use cases we want to support and what we ultimately aim to achieve. By my understanding, there are a few distinct scenarios we’ve been circling around:

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-05-29 Thread Xiaoxuan Li
Hi Peter, thanks for sharing the context around the Flink streaming use case and side note for concurrent write. Apologies for the delay as I just got back from a vacation. Yeah, I agree, having the index at the partition level is a better approach if we plan to use caching. As a distributed cache

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-05-14 Thread ismail simsek
Hi All, Thank you for working on this. I wanted to share a reference to the BigQuery implementation (Option 3) as another potential approach, and for inspiration. In this setup, The engine is running periodic merge jo

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-05-13 Thread Péter Váry
Hi Xiaoxuan, Let me describe, how the Flink streaming writer uses equality deletes, and how it could use indexes. When the Flink streaming writer receives a new insert, then it appends the data to a data file. When it receives a delete, it appends the primary key to an equality delete file. When

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-05-13 Thread Xiaoxuan Li
Hi Peter, Thanks for the detailed illustration. I understand your concern. I believe the core question here is whether the index is used during job planning or at the scan task. This depends on how index files are referenced, at the file level or partition level. In my view, both approaches ulti

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-05-12 Thread Steven Wu
agree with Peter that 1:1 mapping of data files and inverted indexes are not as useful. With columnar format like Parquet, this can also be achieved equivalently by reading the data file with projection on the identifier columns. On Mon, May 12, 2025 at 4:20 AM Péter Váry wrote: > Hi Xiaoxuan,

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-05-12 Thread Péter Váry
Hi Xiaoxuan, Do we plan to store the indexes in a separate file alongside the data files? If so, then I have the following thoughts: - I agree that the 1-on-1 mapping of data files and index files is easy to maintain OTOH it is less useful as an index. - The writer (which is looking for a column w

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-05-10 Thread Xiaoxuan Li
Thanks Anton for the context and summary of the options, great to hear that this direction aligns with earlier community discussions. And thanks Gyula and Peter for the clear analysis. I agree with both of you, the index needs to be designed and implemented efficiently in order to scale for large d

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-05-09 Thread Péter Váry
When going through the options mentioned by Anton, I feel that Option 1 and 4 are just pushing the responsibility of converting the equality deletes to positional deletes to the engine side. The only difference is whether the conversion happens on the write side or on the read side. This is a step

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-05-09 Thread Gyula Fóra
Hi Anton, Thank you for summarizing the options we see at this stage in a structured and concise way. Based on the use-cases I see in the industry, I feel that not all of the highlighted options are feasible (or desirable). Option 4 would basically remove any possibilities for native streaming

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-05-08 Thread Anton Okolnychyi
I am glad to see that folks are thinking about this problem. I am looking forward to a formal proposal/design doc to discuss details! Overall, this aligns with what we discussed in the community earlier w.r.t. the future of equality deletes and streaming upserts. If I were to summarize, we have th

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-05-08 Thread Xiaoxuan Li
Hi Zheng, Steven, Amogh and Gyula. Thank you all for the feedback! I agree with everyone, we need to narrow down the scope of this optimization. The primary issue I'm trying to address is the slow read performance caused by the growing number of equality delete files(streaming CDC scenarios). The

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-05-08 Thread Gyula Fóra
Thank you for the proposal! I agree with what had been said above that we need to narrow down the scope here and what is the primary target for the optimization. As Amogh has also pointed out, CDC (streaming) read performance (with equality deletes) would be one of the biggest beneficiaries of th

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-05-08 Thread Amogh Jahagirdar
Thank you for the proposal Xiaoxuan! I think I agree with Zheng and Steven's point that it'll probably be more helpful to start out with more specific "what" and "why" (known areas of improvement for Iceberg and driven by any use cases) before we get too deep into the "how". In my mind, the specif

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-05-07 Thread Steven Wu
Xiaoxuan, it is unclear to me what exactly we are trying to achieve here. It started with equality vs position deletes. But the proposal mentioned inverted indexes for every column. Note that equality deletes have equality fields (similar to primary key) concept. if we are only talking about row-le

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-05-07 Thread Zheng Hu
Hi Xiaoxuan Thanks to the proposal, the equality delete was designed initially for the fast upserts, such as the upstream cdc stream can be streamed into the iceberg directly, with relatively good freshness. I agreed that if we talked about the best performance then it is partially implemented,

[DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-05-07 Thread Xiaoxuan Li
Hi team, We've been exploring ways to optimize and balance read and write performance in merge-on-read scenarios for a while. Below are our early ideas, and we’d appreciate community feedback to help validate them against the Iceberg spec, especially any edge cases we might have missed. We’re als