Thanks! On Thu, Aug 27, 2020 at 5:33 AM Jark Wu <imj...@gmail.com> wrote:
> Hi, > > Regarding the performance difference, the proposed way will have one more > stateful operator (deduplication) than the native 1.11 cdc support. > The overhead of the deduplication operator is just similar to a simple > group by aggregate (max on each non-key column). > > Best, > Jark > > On Tue, 25 Aug 2020 at 02:21, Rex Fenley <r...@remind101.com> wrote: > >> Thank you so much for the help! >> >> On Mon, Aug 24, 2020 at 4:08 AM Marta Paes Moreira <ma...@ververica.com> >> wrote: >> >>> Yes — you'll get the full row in the payload; and you can also access >>> the change operation, which might be useful in your case. >>> >>> About performance, I'm summoning Kurt and @Jark Wu <j...@apache.org> to >>> the thread, who will be able to give you a more complete answer and likely >>> also some optimization tips for your specific use case. >>> >>> Marta >>> >>> On Fri, Aug 21, 2020 at 8:55 PM Rex Fenley <r...@remind101.com> wrote: >>> >>>> Yup! This definitely helps and makes sense. >>>> >>>> The 'after' payload comes with all data from the row right? So >>>> essentially inserts and updates I can insert/replace data by pk and null >>>> values I just delete by pk, and then I can build out the rest of my joins >>>> like normal. >>>> >>>> Are there any performance implications of doing it this way that is >>>> different from the out-of-the-box 1.11 solution? >>>> >>>> On Fri, Aug 21, 2020 at 2:28 AM Marta Paes Moreira <ma...@ververica.com> >>>> wrote: >>>> >>>>> Hi, Rex. >>>>> >>>>> Part of what enabled CDC support in Flink 1.11 was the refactoring of >>>>> the table source interfaces (FLIP-95 [1]), and the new ScanTableSource >>>>> [2], which allows to emit bounded/unbounded streams with insert, update >>>>> and >>>>> delete rows. >>>>> >>>>> In theory, you could consume data generated with Debezium as regular >>>>> JSON-encoded events before Flink 1.11 — there just wasn't a convenient way >>>>> to really treat it as "changelog". As a workaround, what you can do in >>>>> Flink 1.10 is process these messages as JSON and extract the "after" field >>>>> from the payload, and then apply de-duplication [3] to keep only the last >>>>> row. >>>>> >>>>> The DDL for your source table would look something like: >>>>> >>>>> CREATE TABLE tablename ( *... * after ROW(`field1` DATATYPE, `field2` >>>>> DATATYPE, ...) ) WITH ( 'connector' = 'kafka', 'format' = 'json', ... >>>>> ); >>>>> Hope this helps! >>>>> >>>>> Marta >>>>> >>>>> [1] >>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces >>>>> [2] >>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/api/java/org/apache/flink/table/connector/source/ScanTableSource.html >>>>> [3] >>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/sql/queries.html#deduplication >>>>> >>>>> >>>>> On Fri, Aug 21, 2020 at 10:28 AM Chesnay Schepler <ches...@apache.org> >>>>> wrote: >>>>> >>>>>> @Jark Would it be possible to use the 1.11 debezium support in 1.10? >>>>>> >>>>>> On 20/08/2020 19:59, Rex Fenley wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I'm trying to set up Flink with Debezium CDC Connector on AWS EMR, >>>>>> however, EMR only supports Flink 1.10.0, whereas Debezium Connector >>>>>> arrived >>>>>> in Flink 1.11.0, from looking at the documentation. >>>>>> >>>>>> https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-flink.html >>>>>> >>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/connectors/formats/debezium.html >>>>>> >>>>>> I'm wondering what alternative solutions are available for connecting >>>>>> Debezium to Flink? Is there an open source Debezium connector that works >>>>>> with Flink 1.10.0? Could I potentially pull the code out for the 1.11.0 >>>>>> Debezium connector and compile it in my project using Flink 1.10.0 api? >>>>>> >>>>>> For context, I plan on doing some fairly complicated long lived >>>>>> stateful joins / materialization using the Table API over data ingested >>>>>> from Postgres and possibly MySQL. >>>>>> >>>>>> Appreciate any help, thanks! >>>>>> >>>>>> -- >>>>>> >>>>>> Rex Fenley | Software Engineer - Mobile and Backend >>>>>> >>>>>> >>>>>> Remind.com <https://www.remind.com/> | BLOG >>>>>> <http://blog.remind.com/> | FOLLOW US >>>>>> <https://twitter.com/remindhq> | LIKE US >>>>>> <https://www.facebook.com/remindhq> >>>>>> >>>>>> >>>>>> >>>> >>>> -- >>>> >>>> Rex Fenley | Software Engineer - Mobile and Backend >>>> >>>> >>>> Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> >>>> | FOLLOW US <https://twitter.com/remindhq> | LIKE US >>>> <https://www.facebook.com/remindhq> >>>> >>> >> >> -- >> >> Rex Fenley | Software Engineer - Mobile and Backend >> >> >> Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> >> | FOLLOW US <https://twitter.com/remindhq> | LIKE US >> <https://www.facebook.com/remindhq> >> > -- Rex Fenley | Software Engineer - Mobile and Backend Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> | FOLLOW US <https://twitter.com/remindhq> | LIKE US <https://www.facebook.com/remindhq>