Re: Support Flink SQL Upsert a Spark table

xianjin Wed, 10 Jan 2024 19:08:02 -0800

You can create an Iceberg table with required field, for example:

create table test_table (id bigint not null, data string) using iceberg



However you can not change the optional field to required after creation.
See this issue for more details:
https://github.com/apache/iceberg/issues/3617

Manu Zhang <[email protected]> 于2024年1月11日周四 10:08写道：

> It looks like there's no way to explicitly add a required column in DDL.
> Any suggestions?
>
> Much appreciated
> Manu
>
> On Tue, Jan 9, 2024 at 3:37 PM Manu Zhang <[email protected]> wrote:
>
>> Thanks Peter and Ryan for the info.
>>
>> As identifier fields need to be "required", how can I alter an optional
>> column to be required in Spark SQL?
>>
>> Thanks,
>> Manu
>>
>> On Fri, Jan 5, 2024 at 12:50 AM Ryan Blue <[email protected]> wrote:
>>
>>> You can set the primary key fields in Spark using `ALTER TABLE`:
>>>
>>> `ALTER TABLE t SET IDENTIFIER FIELDS id`
>>>
>>> Spark doesn't support any primary key syntax, so you have to do this as
>>> a separate step.
>>>
>>> On Thu, Jan 4, 2024 at 8:46 AM Péter Váry <[email protected]>
>>> wrote:
>>>
>>>> Hi Manu,
>>>>
>>>> The Iceberg Schema defines `identifierFieldIds` method [1], and Flink
>>>> uses that as the primary key.
>>>> Are you saying there is no way to set it in Spark and Trino?
>>>>
>>>> Thanks,
>>>> Peter
>>>>
>>>> [1]
>>>> https://github.com/apache/iceberg/blob/9a00f7477dedac4501fb2de9e1e6d7aa83dc20b7/api/src/main/java/org/apache/iceberg/Schema.java#L280
>>>>
>>>> Manu Zhang <[email protected]> ezt írta (időpont: 2024. jan. 4.,
>>>> Cs, 16:45):
>>>>
>>>>> Hi all,
>>>>>
>>>>> Currently, we support upserting a Flink created table with Flink SQL
>>>>> where primary keys are required as equality fields. They are not required
>>>>> in Java API.
>>>>>
>>>>> However, if the table is created by Spark, where there's no primary
>>>>> key, we cannot upsert with Flink SQL. Hence, I proposed
>>>>> https://github.com/apache/iceberg/pull/8195 to support specifying
>>>>> equality columns with Flink SQL write options.
>>>>>
>>>>> @pvary  <https://github.com/pvary> suggested it would be better to
>>>>> support primary keys in Spark, Trino, etc. Since these engines don't have
>>>>> primary keys in their table definitions, a workaround is to put primary 
>>>>> key
>>>>> columns in table properties. Maybe there are other options I've missed.
>>>>>
>>>>> Flink SQL sinking to Spark tables for analysis is a typical pipeline
>>>>> in our datalake.  I'd like to hear your thoughts on best supporting this
>>>>> case.
>>>>>
>>>>> Happy New Year!
>>>>> Manu
>>>>>
>>>>
>>>
>>> --
>>> Ryan Blue
>>> Tabular
>>>
>>

Re: Support Flink SQL Upsert a Spark table

Reply via email to