Re: Flink Stream job to parquet sink

2020-06-29 Thread aj
Thanks, Arvid. I will try to implement using the broadcast approach. On Fri, Jun 26, 2020 at 1:10 AM Arvid Heise wrote: > Hi Anuj, > > Yes, broadcast sounds really good. Now you just need to hide the > structural invariance (variable number of sinks) by delegating to inner > sinks. > > public cl

Re: Flink Kafka connector in Python

2020-06-29 Thread Manas Kale
Hi Xingbo, Thank you for the information, it certainly helps! Regards, Manas On Mon, Jun 29, 2020 at 6:18 PM Xingbo Huang wrote: > Hi Manas, > > Since Flink 1.9, the entire architecture of PyFlink has been redesigned. > So the method described in the link won't work. > But you can use more conv

Re: Re: Flip-105 can the debezium/canal SQL sink to database directly?

2020-06-29 Thread Jingsong Li
Hi, Welcome to try 1.11. There is no direct doc to describe this, but I think these docs can help you [1][2] [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sourceSinks.html [2] https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html Best, Jing

Re: Re: Flip-105 can the debezium/canal SQL sink to database directly?

2020-06-29 Thread wangl...@geekplus.com.cn
Thanks Jingsong, Is there any document or example to this? I will build the flink-1.11 package and have a try. Thanks, Lei wangl...@geekplus.com.cn From: Jingsong Li Date: 2020-06-30 10:08 To: wangl...@geekplus.com.cn CC: user Subject: Re: Flip-105 can the debezium/canal SQL sink to datab

Re: Flip-105 can the debezium/canal SQL sink to database directly?

2020-06-29 Thread Leonard Xu
HI Lei, Jingsong is wright, you need define a primary key for your sink table. BTW, Flink use `PRIMARY KEY NOT ENFORCED` to define primary key because Flink doesn’t own data and only supports `NOT ENFORCED` mode, it’s a little bit different with the primary key in DB which is default `ENFORCED`

Re: Flip-105 can the debezium/canal SQL sink to database directly?

2020-06-29 Thread Jingsong Li
Hi Lei, INSERT INTO jdbc_table SELECT * FROM changelog_table; For Flink 1.11 new connectors, you need to define the primary key for jdbc_table (and also your mysql table needs to have the corresponding primary key) because changelog_table has the "update", "delete" records. And then, jdbc sink

Flip-105 can the debezium/canal SQL sink to database directly?

2020-06-29 Thread wangl...@geekplus.com.cn
CREATE TABLE my_table ( id BIGINT, first_name STRING, last_name STRING, email STRING ) WITH ( 'connector'='kafka', 'topic'='user_topic', 'properties.bootstrap.servers'='localhost:9092', 'scan.startup.mode'='earliest-offset', 'format'='debezium-json' ); INSERT INTO mysql_sink_table SE

Avro from avrohugger still invalid

2020-06-29 Thread Georg Heiler
Older versions of flink were incompatible with the Scala specific record classes generated from AvroHugger. https://issues.apache.org/jira/browse/FLINK-12501 Flink 1.10 apparently is fixing this. I am currently using 1.10.1. However, still experience thus problem https://stackoverflow.com/question

Reading and updating rule-sets from a file

2020-06-29 Thread Lorenzo Nicora
Hi My streaming job uses a set of rules to process records from a stream. The rule set is defined in simple flat files, one rule per line. The rule set can change from time to time. A user will upload a new file that must replace the old rule set completely. My problem is with reading and updatin

Announcing ApacheCon @Home 2020

2020-06-29 Thread Rich Bowen
Hi, Apache enthusiast! (You’re receiving this because you’re subscribed to one or more dev or user mailing lists for an Apache Software Foundation project.) The ApacheCon Planners and the Apache Software Foundation are pleased to announce that ApacheCon @Home will be held online, September 29

Re: Flink Kafka connector in Python

2020-06-29 Thread Xingbo Huang
Hi Manas, Since Flink 1.9, the entire architecture of PyFlink has been redesigned. So the method described in the link won't work. But you can use more convenient DDL[1] or descriptor[2] to read kafka data. Besides, You can refer to the common questions about PyFlink[3] [1] https://ci.apache.org/

Flink Kafka connector in Python

2020-06-29 Thread Manas Kale
Hi, I want to consume and write to Kafak from Flink's python API. The only way I found to do this was through this question on SO where the user essentially copies FlinkKafka

[no subject]

2020-06-29 Thread Georg Heiler
Hi, I try to use the confluent schema registry in an interactive Flink Scala shell. My problem is trying to initialize the serializer from the ConfluentRegistryAvroDeserializationSchema fails: ```scala val serializer = ConfluentRegistryAvroDeserializationSchema.forSpecific[Tweet](classOf[Tweet],

Re: Timeout when using RockDB to handle large state in a stream app

2020-06-29 Thread Ori Popowski
Hi there, I'm currently experiencing the exact same issue. http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Heartbeat-of-TaskManager-timed-out-td36228.html I've found out that GC is causing the problem, but I still haven't managed to solve this. On Mon, Jun 29, 2020 at 12:3

Timeout when using RockDB to handle large state in a stream app

2020-06-29 Thread Felipe Gutierrez
Hi community, I am trying to run a stream application with large state in a standalone flink cluster [3]. I configured the RocksDB state backend and I increased the memory of the Job Manager and Task Manager. However, I am still getting the timeout message "java.util.concurrent.TimeoutException: H

Re: Optimal Flink configuration for Standalone cluster.

2020-06-29 Thread Dimitris Vogiatzidakis
> > It could really be specific to your workload. Some workload may need more > heap memory while others may need more off-heap. > The main 'process' of my project creates a cross product of datasets and then applies a function to all of them to extract some features. > Alternatively, you can try

Re: Convert sql table with field of type MULITSET to datastream with field of type java.util.Map[T, java.lang.Integer]

2020-06-29 Thread Timo Walther
Hi YI, not all conversion might be supported in the `toRetractStream` method. Unfortunately, the rework of the type system is still in progress. I hope we can improve the user experience there quite soon. Have you tried to use `Row` instead? `toRetractStream[Row]` should work for all data ty

Re: Error reporting for Flink jobs

2020-06-29 Thread Timo Walther
Hi Satyam, I'm not aware of an API to solve all your problems at once. A common pattern for failures in user-code is to catch errors in user-code and define a side output for an operator to pipe the errors to dedicated sinks. However, such a functionality does not exist in SQL yet. For the si