Hi all,

Currently, we support upserting a Flink created table with Flink SQL where
primary keys are required as equality fields. They are not required in Java
API.

However, if the table is created by Spark, where there's no primary key, we
cannot upsert with Flink SQL. Hence, I proposed
https://github.com/apache/iceberg/pull/8195 to support specifying
equality columns with Flink SQL write options.

@pvary  <https://github.com/pvary> suggested it would be better to support
primary keys in Spark, Trino, etc. Since these engines don't have primary
keys in their table definitions, a workaround is to put primary key columns
in table properties. Maybe there are other options I've missed.

Flink SQL sinking to Spark tables for analysis is a typical pipeline in our
datalake.  I'd like to hear your thoughts on best supporting this case.

Happy New Year!
Manu

Reply via email to