Re: Debezium Flink EMR

2020-09-01 Thread Rex Fenley
This worked, thanks! Looking forward to the future releases :) On Mon, Aug 31, 2020 at 5:06 PM Marta Paes Moreira wrote: > Hey, Rex! > > This is likely due to the tombstone records that Debezium produces for > DELETE operations (i.e. a record with the same key as the deleted row and a > value of

Re: Debezium Flink EMR

2020-08-31 Thread Marta Paes Moreira
Hey, Rex! This is likely due to the tombstone records that Debezium produces for DELETE operations (i.e. a record with the same key as the deleted row and a value of null). These are markers for Kafka to indicate that log compaction can remove all records for the given key, and the initial impleme

Re: Debezium Flink EMR

2020-08-31 Thread Rex Fenley
Hi, getting so close but ran into another issue: Flink successfully reads changes from Debezium/Kafka and writes them to Elasticsearch, but there's a problem with deletions. When I DELETE a row from MySQL the deletion makes it successfully all the way to Elasticsearch which is great, but then the

Re: Debezium Flink EMR

2020-08-31 Thread Rex Fenley
Ah, my bad, thanks for pointing that out Arvid! On Mon, Aug 31, 2020 at 12:00 PM Arvid Heise wrote: > Hi Rex, > > you still forgot > > 'debezium-json.schema-include' = true > > Please reread my mail. > > > On Mon, Aug 31, 2020 at 7:55 PM Rex Fenley wrote: > >> Thanks for the input, though I've

Re: Debezium Flink EMR

2020-08-31 Thread Arvid Heise
Hi Rex, you still forgot 'debezium-json.schema-include' = true Please reread my mail. On Mon, Aug 31, 2020 at 7:55 PM Rex Fenley wrote: > Thanks for the input, though I've certainly included a schema as is > reflected earlier in this thread. Including here again > ... > tableEnv.executeSql("

Re: Debezium Flink EMR

2020-08-31 Thread Rex Fenley
Thanks for the input, though I've certainly included a schema as is reflected earlier in this thread. Including here again ... tableEnv.executeSql(""" CREATE TABLE topic_addresses ( -- schema is totally the same to the MySQL "addresses" table id INT, customer_id INT, street STRING, city STRING, sta

Re: Debezium Flink EMR

2020-08-31 Thread Arvid Heise
Hi Rex, the connector expects a value without a schema, but the message contains a schema. You can tell Flink that the schema is included as written in the documentation [1]. CREATE TABLE topic_products ( -- schema is totally the same to the MySQL "products" table id BIGINT, name STRING,

Re: Debezium Flink EMR

2020-08-28 Thread Rex Fenley
Awesome, so that took me a step further. When running i'm receiving an error however. FYI, my docker-compose file is based on the Debezium mysql tutorial which can be found here https://debezium.io/documentation/reference/1.2/tutorial.html Part of the stack trace: flink-jobmanager_1 | Caused

Re: Debezium Flink EMR

2020-08-27 Thread Jark Wu
Hi, This is a known issue in 1.11.0, and has been fixed in 1.11.1. Best, Jark On Fri, 28 Aug 2020 at 06:52, Rex Fenley wrote: > Hi again! > > I'm tested out locally in docker on Flink 1.11 first to get my bearings > before downgrading to 1.10 and figuring out how to replace the Debezium > con

Re: Debezium Flink EMR

2020-08-27 Thread Rex Fenley
Hi again! I'm tested out locally in docker on Flink 1.11 first to get my bearings before downgrading to 1.10 and figuring out how to replace the Debezium connector. However, I'm getting the following error ``` Provided trait [BEFORE_AND_AFTER] can't satisfy required trait [ONLY_UPDATE_AFTER]. This

Re: Debezium Flink EMR

2020-08-27 Thread Rex Fenley
Thanks! On Thu, Aug 27, 2020 at 5:33 AM Jark Wu wrote: > Hi, > > Regarding the performance difference, the proposed way will have one more > stateful operator (deduplication) than the native 1.11 cdc support. > The overhead of the deduplication operator is just similar to a simple > group by agg

Re: Debezium Flink EMR

2020-08-27 Thread Jark Wu
Hi, Regarding the performance difference, the proposed way will have one more stateful operator (deduplication) than the native 1.11 cdc support. The overhead of the deduplication operator is just similar to a simple group by aggregate (max on each non-key column). Best, Jark On Tue, 25 Aug 2020

Re: Debezium Flink EMR

2020-08-24 Thread Rex Fenley
Thank you so much for the help! On Mon, Aug 24, 2020 at 4:08 AM Marta Paes Moreira wrote: > Yes — you'll get the full row in the payload; and you can also access the > change operation, which might be useful in your case. > > About performance, I'm summoning Kurt and @Jark Wu to > the thread, w

Re: Debezium Flink EMR

2020-08-24 Thread Marta Paes Moreira
Yes — you'll get the full row in the payload; and you can also access the change operation, which might be useful in your case. About performance, I'm summoning Kurt and @Jark Wu to the thread, who will be able to give you a more complete answer and likely also some optimization tips for your spe

Re: Debezium Flink EMR

2020-08-21 Thread Rex Fenley
Yup! This definitely helps and makes sense. The 'after' payload comes with all data from the row right? So essentially inserts and updates I can insert/replace data by pk and null values I just delete by pk, and then I can build out the rest of my joins like normal. Are there any performance impl

Re: Debezium Flink EMR

2020-08-21 Thread Marta Paes Moreira
Hi, Rex. Part of what enabled CDC support in Flink 1.11 was the refactoring of the table source interfaces (FLIP-95 [1]), and the new ScanTableSource [2], which allows to emit bounded/unbounded streams with insert, update and delete rows. In theory, you could consume data generated with Debezium

Re: Debezium Flink EMR

2020-08-21 Thread Chesnay Schepler
@Jark Would it be possible to use the 1.11 debezium support in 1.10? On 20/08/2020 19:59, Rex Fenley wrote: Hi, I'm trying to set up Flink with Debezium CDC Connector on AWS EMR, however, EMR only supports Flink 1.10.0, whereas Debezium Connector arrived in Flink 1.11.0, from looking at the d

Debezium Flink EMR

2020-08-20 Thread Rex Fenley
Hi, I'm trying to set up Flink with Debezium CDC Connector on AWS EMR, however, EMR only supports Flink 1.10.0, whereas Debezium Connector arrived in Flink 1.11.0, from looking at the documentation. https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-flink.html https://ci.apache.org/projects/