Re: How to handle deletion of items using PyFlink SQL?

Dian Fu Wed, 08 Jun 2022 17:54:37 -0700

Hi John,

If you are using Table API & SQL, the framework is handling the RowKind and
it's transparent for you. So usually you don't need to handle RowKind in
Table API & SQL.


Regards,
Dian

On Thu, Jun 9, 2022 at 6:56 AM John Tipper <john_tip...@hotmail.com> wrote:

> Hi Xuyang,
>
> Thank you very much, I’ll experiment tomorrow. Do you happen to know
> whether there is a Python example of udtf() with a RowKind being set (or
> whether it’s supported)?
>
> Many thanks,
>
> John
>
> Sent from my iPhone
>
> On 8 Jun 2022, at 16:41, Xuyang <xyzhong...@163.com> wrote:
>
> 
> Hi, John.
> What about use udtf [1]?
> In your UDTF, all resources are saved as a set or map as s1. When t=2
> arrives, the new resources as s2 will be collected by crawl. I think what
> you want is the deletion data that means 's1' - 's2'.
> So just use loop to find out the deletion data and send RowData in
> function 'eval' in UDTF, and the RowData can be sent with a RowKind
> 'DELETE'[2]. The 'DELETE' means tell the downstream that this value is
> deleted.
>
> I will be glad if it can help you.
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/functions/udfs/#table-functions
> [2]
> https://github.com/apache/flink/blob/44f73c496ed1514ea453615b77bee0486b8998db/flink-core/src/main/java/org/apache/flink/types/RowKind.java#L52
>
>
>
> --
>     Best！
>     Xuyang
>
>
> At 2022-06-08 20:06:17, "John Tipper" <john_tip...@hotmail.com> wrote:
>
> Hi all,
>
> I have some reference data that is periodically emitted by a crawler
> mechanism into an upstream Kinesis data stream, where those rows are used
> to populate a sink table (and where I am using Flink 1.13 PyFlink SQL
> within AWS Kinesis Data Analytics).  What is the best pattern to handle
> deletion of upstream data, such that the downstream table remains in sync
> with upstream?
>
> For example, at t=1, rows R1, R2, R3 are processed from the stream,
> resulting in a DB with 3 rows.  At some point between t=1 and t=2, the
> resource corresponding to R2 was deleted, such that at t=2 when the next
> crawl was carried out only rows R1 and R2 were emitted into the upstream
> stream.  How should I process the stream of events so that when I have
> finished processing the events from t=2 my downstream table also has just
> rows R1 and R3?
>
> Many thanks,
>
> John
>
>

Re: How to handle deletion of items using PyFlink SQL?

Reply via email to