Re: Debezium Flink EMR

Rex Fenley Fri, 21 Aug 2020 11:55:59 -0700

Yup! This definitely helps and makes sense.

The 'after' payload comes with all data from the row right? So essentially
inserts and updates I can insert/replace data by pk and null values I just
delete by pk, and then I can build out the rest of my joins like normal.


Are there any performance implications of doing it this way that is
different from the out-of-the-box 1.11 solution?

On Fri, Aug 21, 2020 at 2:28 AM Marta Paes Moreira <ma...@ververica.com>
wrote:

> Hi, Rex.
>
> Part of what enabled CDC support in Flink 1.11 was the refactoring of the
> table source interfaces (FLIP-95 [1]), and the new ScanTableSource
> [2], which allows to emit bounded/unbounded streams with insert, update and
> delete rows.
>
> In theory, you could consume data generated with Debezium as regular
> JSON-encoded events before Flink 1.11 — there just wasn't a convenient way
> to really treat it as "changelog". As a workaround, what you can do in
> Flink 1.10 is process these messages as JSON and extract the "after" field
> from the payload, and then apply de-duplication [3] to keep only the last
> row.
>
> The DDL for your source table would look something like:
>
> CREATE TABLE tablename ( *... * after ROW(`field1` DATATYPE, `field2`
> DATATYPE, ...) ) WITH ( 'connector' = 'kafka', 'format' = 'json', ... );
> Hope this helps!
>
> Marta
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
> [2]
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/api/java/org/apache/flink/table/connector/source/ScanTableSource.html
> [3]
> https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/sql/queries.html#deduplication
>
>
> On Fri, Aug 21, 2020 at 10:28 AM Chesnay Schepler <ches...@apache.org>
> wrote:
>
>> @Jark Would it be possible to use the 1.11 debezium support in 1.10?
>>
>> On 20/08/2020 19:59, Rex Fenley wrote:
>>
>> Hi,
>>
>> I'm trying to set up Flink with Debezium CDC Connector on AWS EMR,
>> however, EMR only supports Flink 1.10.0, whereas Debezium Connector arrived
>> in Flink 1.11.0, from looking at the documentation.
>>
>> https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-flink.html
>>
>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/connectors/formats/debezium.html
>>
>> I'm wondering what alternative solutions are available for connecting
>> Debezium to Flink? Is there an open source Debezium connector that works
>> with Flink 1.10.0? Could I potentially pull the code out for the 1.11.0
>> Debezium connector and compile it in my project using Flink 1.10.0 api?
>>
>> For context, I plan on doing some fairly complicated long lived stateful
>> joins / materialization using the Table API over data ingested from
>> Postgres and possibly MySQL.
>>
>> Appreciate any help, thanks!
>>
>> --
>>
>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>
>>
>> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
>>  |  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
>> <https://www.facebook.com/remindhq>
>>
>>
>>

-- 

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
 |  FOLLOW
US <https://twitter.com/remindhq>  |  LIKE US
<https://www.facebook.com/remindhq>

Re: Debezium Flink EMR

Reply via email to