Hi John, If you are using Table API & SQL, the framework is handling the RowKind and it's transparent for you. So usually you don't need to handle RowKind in Table API & SQL.
Regards, Dian On Thu, Jun 9, 2022 at 6:56 AM John Tipper <john_tip...@hotmail.com> wrote: > Hi Xuyang, > > Thank you very much, I’ll experiment tomorrow. Do you happen to know > whether there is a Python example of udtf() with a RowKind being set (or > whether it’s supported)? > > Many thanks, > > John > > Sent from my iPhone > > On 8 Jun 2022, at 16:41, Xuyang <xyzhong...@163.com> wrote: > > > Hi, John. > What about use udtf [1]? > In your UDTF, all resources are saved as a set or map as s1. When t=2 > arrives, the new resources as s2 will be collected by crawl. I think what > you want is the deletion data that means 's1' - 's2'. > So just use loop to find out the deletion data and send RowData in > function 'eval' in UDTF, and the RowData can be sent with a RowKind > 'DELETE'[2]. The 'DELETE' means tell the downstream that this value is > deleted. > > I will be glad if it can help you. > > [1] > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/functions/udfs/#table-functions > [2] > https://github.com/apache/flink/blob/44f73c496ed1514ea453615b77bee0486b8998db/flink-core/src/main/java/org/apache/flink/types/RowKind.java#L52 > > > > -- > Best! > Xuyang > > > At 2022-06-08 20:06:17, "John Tipper" <john_tip...@hotmail.com> wrote: > > Hi all, > > I have some reference data that is periodically emitted by a crawler > mechanism into an upstream Kinesis data stream, where those rows are used > to populate a sink table (and where I am using Flink 1.13 PyFlink SQL > within AWS Kinesis Data Analytics). What is the best pattern to handle > deletion of upstream data, such that the downstream table remains in sync > with upstream? > > For example, at t=1, rows R1, R2, R3 are processed from the stream, > resulting in a DB with 3 rows. At some point between t=1 and t=2, the > resource corresponding to R2 was deleted, such that at t=2 when the next > crawl was carried out only rows R1 and R2 were emitted into the upstream > stream. How should I process the stream of events so that when I have > finished processing the events from t=2 my downstream table also has just > rows R1 and R3? > > Many thanks, > > John > >