Re: onTimer() of a KeyedProcessFunction stops getting triggered after a while

2022-03-18 Thread Binil Benjamin
Hi, Parallelism is currently set to 9 and it appears to be occurring for all subtasks. We did put logs to see the various timestamps. The following logs are from the last 5 days. - logs from processElement() - logged immediately after timer registration: "message": "FunctionName=WfProcessFun

Re: onTimer() of a KeyedProcessFunction stops getting triggered after a while

2022-03-18 Thread Yun Gao
Hi Binil, I think the code itself also looks good to me. May I have a double confirmation on the details of the issue: 1. What is the parallelism of this operator, and does the issues occurs for all the subtasks? 2. Have we already added some logs in the processElement and onTimer to print the

Re: Controlling group partitioning with DataStream

2022-03-18 Thread Ken Krugler
Hi Guowei, Thanks for following up on this, sorry I missed your email earlier. Unfortunately I don’t think auto-rebalancing will help my situation, because I have a small number of unique key values (low cardinality). And processing these groups (training one deep-learning model per group) req

Re: Flink SQL 1.12 How to implement query Hbase table on secondary index

2022-03-18 Thread Jing Ge
HI WuKong, Afaiu, technically, you are not using Hbase secondary index(coprocessor). What you are trying to do is to store the synced dim table in elasticsearch and query from there to get the rowkeys and then use the rowkeys to get dim table rows from Hbase. In this way, a (full) table scan in Hb

Re: Kafka source with multiple partitions loses data during savepoint recovery

2022-03-18 Thread Sharon Xie
Thanks Qingsheng, I'm using Flink SQL. I'll need to dig a bit deeper to see which Flink consumer class is actually used. I didn't find the log you linked. However I found the following log from the task manager. Just a note, in this test, I sent 10 records to the source topic which has 10 partiti

Re: Clarifying ProcessFunction.onTimer and watermark behavior

2022-03-18 Thread Yun Gao
Hi Dan, The default implementation in Flink [1] would first process all the timers before emit the watermark out, Thus the watermark should be after the records emitted in processing timers. Best, Yun Gao [1] https://github.com/apache/flink/blob/ab08b52030a9612571896c579d85e000134ad0f1/flink-

Re: Issue with Flink UI for Flink 1.14.0

2022-03-18 Thread Chesnay Schepler
That issues tracked under https://issues.apache.org/jira/browse/FLINK-25904. We don't yet know the cause. On 18/03/2022 13:37, Peter Westermann wrote: Just started testing Flink 1.14.4 since that fixes FLINK-25732 … and I am now running into another UI issue. On the jobmanager that is not cur

Re: Issue with Flink UI for Flink 1.14.0

2022-03-18 Thread Peter Westermann
Just started testing Flink 1.14.4 since that fixes FLINK-25732 … and I am now running into another UI issue. On the jobmanager that is not currently the active jobmanager, the checkpoints tab is not working. It just displays “No Data“, when I use the corresponding API /jobs/{id}/checkpoints, I g

Re: scala shell not part of 1.14.4 download

2022-03-18 Thread Chesnay Schepler
The Scala Shell only works with Scala 2.11. You will need to use the Scala 2.11 Flink distribution. On 18/03/2022 12:42, Georg Heiler wrote: Hi, https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/repls/scala_shell/ mentions: |bin/start-scala-shell.sh local | |a scrip

Re: scala shell not part of 1.14.4 download

2022-03-18 Thread Jeff Zhang
Hi Georg, You can try Zeppelin 0.10.1 which supports scala 2.12 for Flink interpreter. Internally, Flink interpreter of Zeppelin use scala shell, you can write scala code and run it in an interactive way. https://zeppelin.apache.org/download.html https://zeppelin.apache.org/docs/0.10.1/interprete

Re: scala shell not part of 1.14.4 download

2022-03-18 Thread Martijn Visser
Hi Georg, As far as I know, there has never been a Scala Shell for Scala 2.12 because it was not supported, only for Scala 2.11. The Scala Shell also also been completely dropped with Flink 1.15. Best regards, Martijn On Fri, 18 Mar 2022 at 12:43, Georg Heiler wrote: > Hi, > > > https://night

scala shell not part of 1.14.4 download

2022-03-18 Thread Georg Heiler
Hi, https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/repls/scala_shell/ mentions: bin/start-scala-shell.sh local a script to start a scala REPL shell. But the download for Flink https://www.apache.org/dyn/closer.lua/flink/flink-1.14.4/flink-1.14.4-bin-scala_2.12.tgz D

Re: how to set kafka sink ssl properties

2022-03-18 Thread Qingsheng Ren
Hi, Your usage looks good to me, but could you provide the exception (if any) or the unexpected behavior you met after starting the job? It’s difficult to debug with only these configurations. Best regards, Qingsheng > On Mar 18, 2022, at 01:04, HG wrote: > > Hi Matthias, > > It should

Re: Kafka source with multiple partitions loses data during savepoint recovery

2022-03-18 Thread Qingsheng Ren
Hi Sharon, Could you check the log after starting the job with savepoint? If you have INFO log enabled you will get an entry “Consumer subtask {} will start reading {} partitions with offsets in restored state: {}” [1] in the log, which shows the starting offset of partitions. This might be he

Re: Adjusted frame length exceeds 2147483647

2022-03-18 Thread Ori Popowski
I am not aware that we use Nessus. We are running on Google Cloud Dataproc vanilla. On the other hand, Flink restarted from a checkpoint and reread the Kafka offset and didn't encounter this error again, so if it was something in the input I would expect it to happen again when it processes the Kaf

Re: Weird Flink Kafka source watermark behavior

2022-03-18 Thread Dongwon Kim
I totally agree with Schwalbe that per-partition watermarking allows # source tasks < # kafka partitions. Otherwise, Dan, you should suspect other possibilities like what Schwalbe said. Best, Dongwon On Fri, Mar 18, 2022 at 5:01 PM Schwalbe Matthias < matthias.schwa...@viseca.ch> wrote: > Hi S

Re: onTimer() of a KeyedProcessFunction stops getting triggered after a while

2022-03-18 Thread Binil Benjamin
Hi, Unfortunately, I cannot share the entire code, but the class roughly looks like this: public class WfProcessFunction extends KeyedProcessFunction, Map, Map> { @Override public void processElement(Map inputRecord, Context context, Collector> collector) throws Exception {

RE: Weird Flink Kafka source watermark behavior

2022-03-18 Thread Schwalbe Matthias
Oops mistyped your name, Dan From: Schwalbe Matthias Sent: Freitag, 18. März 2022 09:02 To: 'Dan Hill' ; Dongwon Kim Cc: user Subject: RE: Weird Flink Kafka source watermark behavior Hi San, Dongwon, I share the opinion that when per-partition watermarking is enabled, you should observe corre

RE: Weird Flink Kafka source watermark behavior

2022-03-18 Thread Schwalbe Matthias
Hi San, Dongwon, I share the opinion that when per-partition watermarking is enabled, you should observe correct behavior … would be interesting to see why it does not work for you. I’d like to clear one tiny misconception here when you write: >> - The same issue happens even if I use an idle

Re: Weird Flink Kafka source watermark behavior

2022-03-18 Thread Dan Hill
I'll try forcing # source tasks = # partitions tomorrow. Thank you, Dongwon, for all of your help! On Fri, Mar 18, 2022 at 12:20 AM Dongwon Kim wrote: > I believe your job with per-partition watermarking should be working okay > even in a backfill scenario. > > BTW, is the problem still observe

Re: Weird Flink Kafka source watermark behavior

2022-03-18 Thread Dongwon Kim
I believe your job with per-partition watermarking should be working okay even in a backfill scenario. BTW, is the problem still observed even with # sour tasks = # partitions? For committers: Is there a way to confirm that per-partition watermarking is used in TM log? On Fri, Mar 18, 2022 at 4:

Re: Weird Flink Kafka source watermark behavior

2022-03-18 Thread Dan Hill
I hit this using event processing and no idleness detection. The same issue happens if I enable idleness. My code matches the code example for per-partition watermarking

Re: Weird Flink Kafka source watermark behavior

2022-03-18 Thread Dongwon Kim
Hi Dan, I'm quite confused as you already use per-partition watermarking. What I meant in the reply is - If you don't use per-partition watermarking, # tasks < # partitions can cause the problem for backfill jobs. - If you don't use per-partition watermarking, # tasks = # partitions is going to b