Re: Temporal Iceberg Service

2022-09-01 Thread Ryan Blue
Taher, Positional deletes require that you know the file and position in that file of the record you want to delete. So you're right that if you want to use a positional delete, you need to either keep an index of where records are (which is what we do when upserting records) or scan to find the r

Re: Temporal Iceberg Service

2022-09-01 Thread Taher Koitawala
My question is not about planning a scan. My question is around the CDC log implementation, so if a writer is open and I get a insert and delete for a record. If i do EQ delete that record is gone. However if i do insert, delete and insert the exact same record while the writer is currently open I

Re: Temporal Iceberg Service

2022-09-01 Thread Zoltán Borók-Nagy
Hi Taher, I think most of your questions are answered in the Scan Planning section at the Iceberg spec page: https://iceberg.apache.org/spec/#scan-planning To give you some specific answers as well: Equality Deletes: data and delete files have sequence numbers from which readers can infer the rel

Re: Temporal Iceberg Service

2022-08-31 Thread Taher Koitawala
Thank you, Ryan and the iceberg community the suggestions really helped progress a lot of development. On the same usecase, I hit another block about doing CDC updates and deletes. I see two options for managing deletes, for now, EqualityDeletes and PositionalDeletes: 1. EqaulityDeletes need m

Re: Temporal Iceberg Service

2022-08-24 Thread Taher Koitawala
Thank you for your response Ryan. We will evaluate your suggestions to sticking with a query engine and also I will try to code you share with me. On Thu, 25 Aug, 2022, 2:25 am Ryan Blue, wrote: > Hi Taher, > > It looks like you’re writing something in Java to work with the data > directly. Th

Re: Temporal Iceberg Service

2022-08-24 Thread Ryan Blue
Hi Taher, It looks like you’re writing something in Java to work with the data directly. That’s well supported, but you may want to consider using a compute engine to make this process a bit easier. Most of the issues that you’re hitting would probably be solved automatically because those engines

Re: Temporal Iceberg Service

2022-08-24 Thread Taher Koitawala
Hi All, Please can someone guide me regarding the above email? Regards, Taher Koitawala On Tue, Aug 23, 2022 at 5:46 PM Taher Koitawala wrote: > Hi All, > I am creating an iceberg writer over temporal service that > converts CDC parquet files to Iceberg format. That means that

Temporal Iceberg Service

2022-08-23 Thread Taher Koitawala
Hi All, I am creating an iceberg writer over temporal service that converts CDC parquet files to Iceberg format. That means that the file will have a record and corresponding timestamp flags like `inserted_at`, `deleted_at` and `updated_at`, each of which will have a value defining the acti