Hi All, Please can someone guide me regarding the above email? Regards, Taher Koitawala
On Tue, Aug 23, 2022 at 5:46 PM Taher Koitawala <taher...@gmail.com> wrote: > Hi All, > I am creating an iceberg writer over temporal service that > converts CDC parquet files to Iceberg format. That means that the file will > have a record and corresponding timestamp flags like `inserted_at`, > `deleted_at` and `updated_at`, each of which will have a value defining the > action. > > Initially, when there is no table in the iceberg catalog, the plan is to > use the Parquet footer schema and map that directly to the Iceberg schema > using *org.apache.iceberg.parquet.ParquetSchemaUtil.convert(MessageType > parquetSchema).* However, the issue that I am facing is that I am also > having to convert Parquet datatypes to Iceberg datatypes, specifically the > timestamp types when inserting into the table. > > When using the Parquet reader with the simple group, I see the timestamp > as long and when inserted to iceberg, it expects it to be > *java.time.OffsetDateTime*, specific error I get is `Long cannot be cast > to OffsetDateTime` > > I have 2 questions on this use case: > 1. Is there an easy way to insert parquet to iceberg records directly > without me having to do a type conversion since the goal is to make it all > happen within temporal? > 2. Need suggestions to handle updates. As for updates I'm having to commit > inserts and then commit deletes and then create a new writer again > to proceed. > > Regards, > Taher Koitawala >