Re: Size of state for any known production use case

2020-02-13 Thread RKandoji
xtsummit.com/wp-content/uploads/2019/11/Stephan_Ewen_Stream_Processing_Beyond_Streaming.pdf > (Slide > 3) > [2] https://www.youtube.com/channel/UCY8_lgiZLZErZPF47a2hXMA/videos > [3] https://www.youtube.com/watch?v=2C44mUPlx5o > > On Wed, Feb 12, 2020 at 10:42 PM RKandoji wrote: > >>

Size of state for any known production use case

2020-02-12 Thread RKandoji
Hi Team, I've done a POC using Flink and planning to give a presentation about my learnings and share the benefits of using Flink. I understand that companies are using Flink to handle Tera Bytes of state, but it would be great if you could point me to any reference of a company using Flink produ

Re: Issue with committing Kafka offsets

2020-01-31 Thread RKandoji
I had to disable auto commit for it to work. I understand auto commit is just for monitoring purpose so I assume it should be safe to run it like that. properties.put("enable.auto.commit", "false"); On Fri, Jan 31, 2020 at 1:09 PM RKandoji wrote: > Hi, > > T

Re: Issue with committing Kafka offsets

2020-01-31 Thread RKandoji
kKafkaConsumer. > > Hope this helps, > Gordon > > On Sat, Feb 1, 2020 at 12:54 AM RKandoji wrote: > >> Can someone please help me here. >> >> Thanks >> RK >> >> >> On Thu, Jan 30, 2020 at 7:51 PM RKandoji wrote: >> >>> Hi

Re: Issue with committing Kafka offsets

2020-01-31 Thread RKandoji
Can someone please help me here. Thanks RK On Thu, Jan 30, 2020 at 7:51 PM RKandoji wrote: > Hi Team, > > I'm running into strange issue pasted below: > > Committing offsets to Kafka takes longer than the checkpoint interval. > Skipping commit of previous offsets

Issue with committing Kafka offsets

2020-01-30 Thread RKandoji
Hi Team, I'm running into strange issue pasted below: Committing offsets to Kafka takes longer than the checkpoint interval. Skipping commit of previous offsets because newer complete checkpoint offsets are available. This does not compromise Flink's checkpoint integrity. I read data from more

Re: BlinkPlanner limitation related clarification

2020-01-27 Thread RKandoji
Hi Jingsong, Thanks for the clarification! The limitation description is a bit confusing to me but it was clear after seeing the above example posted by you. Regards, RK. On Mon, Jan 27, 2020 at 6:25 AM Jingsong Li wrote: > Hi RKandoji, > > You understand this bug wrong, your code

Re: BlinkPlanner limitation related clarification

2020-01-26 Thread RKandoji
bsTableEnv.sqlQuery(...) and so on.. Could you please let me know if anything specific I need to look at? I would like to understand what was wrong before changing the code. Thanks, RK On Thu, Jan 23, 2020 at 11:48 PM Jingsong Li wrote: > Hi RKandoji, > > IMO, yes, you can no

BlinkPlanner limitation related clarification

2020-01-23 Thread RKandoji
ore details about the implications. Thanks, RKandoji

Job Manager heap metrics

2020-01-16 Thread RKandoji
Hi, Could someone please tell me what is the best way to check amount of heap consumed by Job Manager? Currently I added huge heap of 20GB for both Job Manager and Task Manager. I'm able to see task manager heap usage on UI but not for Job Manager. I would like to decide how much heap to allocat

Re: How to verify if checkpoints are asynchronous or sync

2020-01-08 Thread RKandoji
b.com/jvm-profiling-tools/async-profiler > Best, > Congxian > > > William C 于2020年1月8日周三 上午11:37写道: > >> Hallo >> >> on 2020/1/8 11:31, RKandoji wrote: >> > I'm running my job on a EC2 instance with 32 cores and according to the >> >

Re: How to verify if checkpoints are asynchronous or sync

2020-01-07 Thread RKandoji
27;m using 32 task slots. Performance seems better at 26 task slots than >26 task slots. So I was trying to understand if additional CPU cores are being utilized by checkpointing or any other async (or background operations, in the process I was trying to verify if the checkpointing is async. Tha

Re: How to verify if checkpoints are asynchronous or sync

2020-01-07 Thread RKandoji
Thanks for the reply. I will check and enable debug logs specifically for the class that contains this log. But in general logs are already too huge and I'm trying to suppress some of them, so wondering if there is any other way? Thanks, RKandoji On Tue, Jan 7, 2020 at 7:50 PM William C

Re: Duplicate tasks for the same query

2020-01-07 Thread RKandoji
y > upstream failure and process the same data again. In that case, each key > will only have > at most one record and you won't face any join key skewing issue. > > Best, > Kurt > > > On Mon, Jan 6, 2020 at 6:55 AM RKandoji wrote: > >> Hi Kurt, >>

How to verify if checkpoints are asynchronous or sync

2020-01-07 Thread RKandoji
w. Thanks, RKandoji

Re: Duplicate tasks for the same query

2020-01-05 Thread RKandoji
een updated. Thanks, RKandoji On Fri, Jan 3, 2020 at 9:57 PM Kurt Young wrote: > Hi RKandoji, > > It looks like you have a data skew issue with your input data. Some or > maybe only one "userId" appears more frequent than others. For join > operator to work correctly, Flink

Re: Duplicate tasks for the same query

2020-01-03 Thread RKandoji
w to fix it would be very helpful. Thanks, RKandoji On Fri, Jan 3, 2020 at 1:06 PM RKandoji wrote: > Thanks! > > On Thu, Jan 2, 2020 at 9:45 PM Jingsong Li wrote: > >> Yes, >> >> 1.9.2 or Coming soon 1.10 >> >> Best, >> Jingsong Lee >> >>

Re: Duplicate tasks for the same query

2020-01-03 Thread RKandoji
Thanks! On Thu, Jan 2, 2020 at 9:45 PM Jingsong Li wrote: > Yes, > > 1.9.2 or Coming soon 1.10 > > Best, > Jingsong Lee > > On Fri, Jan 3, 2020 at 12:43 AM RKandoji wrote: > >> Ok thanks, does it mean version 1.9.2 is what I need to use? >> >>

Re: Duplicate tasks for the same query

2020-01-02 Thread RKandoji
ent has also been set up in some places. > > Best, > Jingsong Lee > > On Wed, Jan 1, 2020 at 3:24 AM RKandoji wrote: > >> Thanks Jingsong and Kurt for more details. >> >> Yes, I'm planning to try out DeDuplication when I'm done upgrading to >> v

Re: Duplicate tasks for the same query

2019-12-31 Thread RKandoji
nk-docs-release-1.9/dev/table/sql.html#top-n > > > On Tue, Dec 31, 2019 at 9:24 AM Jingsong Li > wrote: > >> Hi RKandoji, >> >> In theory, you don't need to do something. >> First, the optimizer will optimize by doing duplicate nodes. >> Second

Re: Duplicate tasks for the same query

2019-12-30 Thread RKandoji
Li wrote: > Hi RKandoji, > > FYI: Blink-planner subplan reusing: [1] 1.9 available. > >Join Join > / \ / \ > Filter1 Filter2 Filter1 Filter2 > ||=> \ / >

Fwd: Duplicate tasks for the same query

2019-12-29 Thread RKandoji
Hi Team, I'm doing a POC with flink to understand if it's a good fit for my use case. As part of the process, I need to filter duplicate items and created below query to get only the latest records based on timestamp. For instance, I have "Users" table which may contain multiple messages for the