Thanks a lot for the detailed explanation, makes complete sense.

Gyula

On Wed, May 6, 2020 at 3:53 PM Jingsong Li <jingsongl...@gmail.com> wrote:

> Hi,
>
> Insert overwrite comes from Batch SQL in Hive.
> It means overwriting whole table/partition instead of overwriting per key.
> So if "insert overwrite kudu_table", should delete whole table in kudu
> first, and then insert new data to the table in kudu.
>
> The same semantics should be used in streaming jobs, but I don't know if
> there are any requirements.
>
> UpsertStreamTableSink is a way to deal with changelog in streaming jobs,
> the upsert messages and delete messages are produced by keyed and retracted
> stream, this is not related to "insert overwrite" grammar.
> And FYI, upsert stream sink will be unified to "DynamicTableSink" with
> primary keys in DDL.[1]
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces#FLIP-95:NewTableSourceandTableSinkinterfaces-SinkInterfaces
>
>
> Best,
> Jingsong Lee
>
> On Wed, May 6, 2020 at 8:48 PM Gyula Fóra <gyula.f...@gmail.com> wrote:
>
> > Hi all!
> >
> > While working on a Table Sink implementation for Kudu (key-value store) ,
> > we got a bit confused about the expected semantics of
> UpsertStreamTableSink
> > vs OverwritableTableSink and statements like INSERT vs INSERT OVERWRITE
> >
> > I am wondering what external operation should each incoming record
> > corresponds to?
> >  - Insert (fail on duplicate row)
> >  - Delete
> >  - Upsert
> >
> > Purely looking at the UpsertStreamTableSink we either upsert or delete so
> > every INSERT is pretty much an INSERT OVERWRITE. Still we cannot use
> INSERT
> > OVERWRITE if we don't implement OverwritableTableSink.
> >
> > So I wonder what implementing OverwritableTableSink is expected to do for
> > an UpsertStreamTableSink.
> >
> > Can someone please help me clarify this so we can get it right?
> >
> > Thanks
> > Gyula
> >
>
>
> --
> Best, Jingsong Lee
>

Reply via email to