Taher,
Positional deletes require that you know the file and position in that file
of the record you want to delete. So you're right that if you want to use a
positional delete, you need to either keep an index of where records are
(which is what we do when upserting records) or scan to find the r
My question is not about planning a scan. My question is around the CDC log
implementation, so if a writer is open and I get a insert and delete for a
record. If i do EQ delete that record is gone.
However if i do insert, delete and insert the exact same record while the
writer is currently open I
Hi Taher,
I think most of your questions are answered in the Scan Planning section at
the Iceberg spec page: https://iceberg.apache.org/spec/#scan-planning
To give you some specific answers as well:
Equality Deletes: data and delete files have sequence numbers from which
readers can infer the rel
Thank you, Ryan and the iceberg community the suggestions really helped
progress a lot of development. On the same usecase, I hit another
block about doing CDC updates and deletes.
I see two options for managing deletes, for now, EqualityDeletes and
PositionalDeletes:
1. EqaulityDeletes need m
Thank you for your response Ryan. We will evaluate your suggestions to
sticking with a query engine and also I will try to code you share with me.
On Thu, 25 Aug, 2022, 2:25 am Ryan Blue, wrote:
> Hi Taher,
>
> It looks like you’re writing something in Java to work with the data
> directly. Th
Hi Taher,
It looks like you’re writing something in Java to work with the data
directly. That’s well supported, but you may want to consider using a
compute engine to make this process a bit easier. Most of the issues that
you’re hitting would probably be solved automatically because those engines
Hi All,
Please can someone guide me regarding the above email?
Regards,
Taher Koitawala
On Tue, Aug 23, 2022 at 5:46 PM Taher Koitawala wrote:
> Hi All,
> I am creating an iceberg writer over temporal service that
> converts CDC parquet files to Iceberg format. That means that
Hi All,
I am creating an iceberg writer over temporal service that converts
CDC parquet files to Iceberg format. That means that the file will have a
record and corresponding timestamp flags like `inserted_at`, `deleted_at`
and `updated_at`, each of which will have a value defining the acti