Hi All,
         Please can someone guide me regarding the above email?

Regards,
Taher Koitawala

On Tue, Aug 23, 2022 at 5:46 PM Taher Koitawala <taher...@gmail.com> wrote:

> Hi All,
>         I am creating an iceberg writer over temporal service that
> converts CDC parquet files to Iceberg format. That means that the file will
> have a record and corresponding timestamp flags like `inserted_at`,
> `deleted_at` and `updated_at`, each of which will have a value defining the
> action.
>
> Initially, when there is no table in the iceberg catalog, the plan is to
> use the Parquet footer schema and map that directly to the Iceberg schema
> using *org.apache.iceberg.parquet.ParquetSchemaUtil.convert(MessageType
> parquetSchema).* However, the issue that I am facing is that I am also
> having to convert Parquet datatypes to Iceberg datatypes, specifically the
> timestamp types when inserting into the table.
>
> When using the Parquet reader with the simple group, I see the timestamp
> as long and when inserted to iceberg, it expects it to be
> *java.time.OffsetDateTime*, specific error I get is `Long cannot be cast
> to OffsetDateTime`
>
> I have 2 questions on this use case:
> 1. Is there an easy way to insert parquet to iceberg records directly
> without me having to do a type conversion since the goal is to make it all
> happen within temporal?
> 2. Need suggestions to handle updates. As for updates I'm having to commit
> inserts and then commit deletes and then create a new writer again
> to proceed.
>
> Regards,
> Taher Koitawala
>

Reply via email to