I am using flink sql to do some complicated calculations. I have
encountered some very difficult problems in this process, so I would like to
ask everyone for your help. My goal is to build a data stream with a very
accurate result, which is also in line with the Streaming System. The core
idea of ​​this book is also what I have to do. I use kafka to receive the
mysql binlog as the data source, then join into multiple tables, and then
perform complex sql calculations on these multiple tables. I found that
flink does not provide upsert. Implementation, so I added a last_value(xxx),
last_value(xxx)..group by(id) operation for each kafka data source to ensure
consistency of the final result, which works, I understand this Will cache a
dynamic table, resulting in a large state (about 3 G), but seems to
introduce some other very strange problems, summarized as follows:

1. In the case of sql is very complicated, it is clear that checkpoint is
turned on, but the web interface finds that there is no checkpoint at all,
and none of them
2. During the running of the program, it frequently hangs. The error has
always been the following error:

(1) the assigned slot id_xxxxx was removed
(2) the heartbeat with taskmanager was timeout

I have used slotSharingGroup to split tasks into different slots whenever
possible, but I still often report these two errors, causing the program to
hang.

I have no clue about these mistakes. If anyone can help, I really appreciate
it.

Added: I receive data from 4 kafka topics, the maximum amount of data is
more than 20 million.
My startup command is as follows
 
Flink1.6/bin/flink run -m yarn-cluster -ytm 23240 -yn 3 -ys 2 -ynm xxxx -yqu
xxxx -c xxxxxxx xxx.jar ./test.conf



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply via email to