A strange exception while consumption using a multi topic Kafka Connector

2018-05-11 Thread Vishal Santoshi
java.util.ConcurrentModificationException at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:966) at java.util.LinkedList$ListItr.next(LinkedList.java:888) at org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.runFetchLoop(Kafka09Fetche

RE: Lost JobManager

2018-05-11 Thread Newport, Billy
It’s standard 1.3.2 on Java 7. We don’t use custom flink builds, just pull down whats in maven. From: Stephan Ewen [mailto:se...@apache.org] Sent: Friday, May 11, 2018 2:27 PM To: user@flink.apache.org Cc: Chan, Regina [Tech]; Newport, Billy [Tech]; Fabian Hueske Subject: Re: Lost JobManager Hi

Re: Lost JobManager

2018-05-11 Thread Stephan Ewen
Hi! This somehow looks like the YARN shutdown call overtakes the call that marks the Job as successful. That should not be the case. Looking at the logs, you are running a Java 7 installation, so I assume this must be some Flink 1.3.x based build. Can you let us know which version this is based o

Better way to clean up state when connect

2018-05-11 Thread Chengzhi Zhao
Hi there, I have a use case to check for active ID, there are two streams and I connect them: one has actual data (Stream A) and the other one is for lookup purpose (Stream B), I am getting Stream B as a file which includes all active ID, so inactive ID would not be show up on this list. I tried t

Re: Iterative Stream won't loop

2018-05-11 Thread Rong Rong
Yeah that too. Agree with Paris here. especially you are not only doing a windowed aggregation but a join step. Using graph processing engine [1] might be the best idea here. Another possible way is to create a rich agg function over a combination of the tuple, this way you are complete in contr

Re: Delay in Flink timers

2018-05-11 Thread Stephan Ewen
Checkpoints are largely asynchronous, but the checkpointing of timers has some synchronous component (which we are currently working on getting rid of). So when you have a lot of timers, streams stall for a short time while the timers are checkpointed. If all goes as planned, Flink 1.6 will not hav

Re: Late data before window end is even close

2018-05-11 Thread Piotr Nowojski
Generally speaking best practise is always to simplify your program as much as possible to narrow down the scope of the search. Replace data source with statically generated events, remove unnecessary components Etc. Either such process help you figure out what’s wrong on your own and if not, if

Re: Late data before window end is even close

2018-05-11 Thread Juho Autio
Thanks for that code snippet, I should try it out to simulate my DAG.. If any suggestions how to debug futher what's causing late data on a production stream job, please let me know. On Fri, May 11, 2018 at 2:18 PM, Piotr Nowojski wrote: > Hey, > > Actually I think Fabian initial message was inc

Re: Flink Forward SF 2018 Videos

2018-05-11 Thread Piotr Nowojski
Hi, Previous videos were always uploaded there, so I guess the new one should appear there shortly. Laura might now something more about it. Thanks, Piotrek > On 10 May 2018, at 23:44, Rafi Aroch wrote: > > Hi, > > Are there any plans to upload the videos to the Flink Forward YouTube channel

Re: Checkpoint is not triggering as per configuration

2018-05-11 Thread Piotr Nowojski
Hi, It’s not considered as a bug, only a missing not yet implemented feature (check my previous responses for the Jira ticket). Generally speaking using file input stream for DataStream programs is not very popular, thus this was so far low on our priority list. Piotrek > On 10 May 2018, at 0

Re: How to broadcast messages to all task manager instances in cluster?

2018-05-11 Thread Piotr Nowojski
Hi, I don’t quite understand your problem. If you broadcast message as an input to your operator that depends on this configuration, each instance of your operator will receive this configuration. It shouldn't matter whether Flink scheduled your operator on one, some or all of the TaskManagers.

Re: Late data before window end is even close

2018-05-11 Thread Piotr Nowojski
Hey, Actually I think Fabian initial message was incorrect. As far as I can see in the code of WindowOperator (last lines of org.apache.flink.streaming.runtime.operators.windowing.WindowOperator#processElement ), the element is sent to late side output if it is late AND it wasn’t assigned to a

Re: Late data before window end is even close

2018-05-11 Thread Juho Autio
Thanks. What I still don't get is why my message got filtered in the first place. Even if the allowed lateness filtering would be done "on the window", data should not be dropped as late if it's not in fact late by more than the allowedLateness setting. Assuming that these conditions hold: - messa

Re: Iterative Stream won't loop

2018-05-11 Thread Paris Carbone
Hey! I would recommend against using iterations with windows for that problem at the moment. Alongside loop scoping and backpressure that will be addressed by FLIP-15 [1] I think you also need the notion of stream supersteps, which is experimental work in progress for now, from my side at least

Re: Late data before window end is even close

2018-05-11 Thread Fabian Hueske
Hi Juho, Thanks for bringing up this topic! I share your intuition. IMO, records should only be filtered out and send to a side output if any of the windows they would be assigned to is closed already. I had a look into the code and found that records are filtered out as late based on the followi

How to broadcast messages to all task manager instances in cluster?

2018-05-11 Thread Di Tang
Hi guys: I have a Flink job which contains multiple pipelines. Each pipeline depends on some configuration. I want to make the configuration dynamic and effective after change so I created a data source which periodically poll the database storing the configuration. However, how can I broadcast th

Late data before window end is even close

2018-05-11 Thread Juho Autio
I don't understand why I'm getting some data discarded as late on my Flink stream job a long time before the window even closes. I can not be 100% sure, but to me it seems like the kafka consumer is basically causing the data to be dropped as "late", not the window. I didn't expect this to ever ha

Re: Connect keyed stream with broadcast

2018-05-11 Thread Fabian Hueske
Hi Navneeth, Yes, connecting broadcast and keyed streams will be supported in Flink 1.5 (also search for broadcast state). The docs haven't been merged yet but there's a pull request [1]. Best, Fabian [1] https://github.com/apache/flink/pull/5922 2018-05-10 22:27 GMT+02:00 Navneeth Krishnan :

Re: Application logs missing from jobmanager log

2018-05-11 Thread Fabian Hueske
Great, thank you! 2018-05-11 10:31 GMT+02:00 Juho Autio : > Thanks Fabian, here's the ticket: > https://issues.apache.org/jira/browse/FLINK-9335 > > On Wed, May 2, 2018 at 12:53 PM, Fabian Hueske wrote: > >> Hi Juho, >> >> I assume that these logs are generated from a different process, i.e., >>

Re: Flink State monitoring

2018-05-11 Thread Juho Autio
Bump this. I can create a ticket if it helps? On Tue, Apr 24, 2018 at 4:47 PM, Juho Autio wrote: > Anything to add? Is there a Jira ticket for this yet? > > > On Fri, Apr 20, 2018 at 1:03 PM, Stefan Richter < > s.rich...@data-artisans.com> wrote: > >> If estimates are good enough, we should be a

Re: Latency with cross operation on Datasets

2018-05-11 Thread Fabian Hueske
Hi Varun, The focus of the DataSet execution is on robustness. The smaller DataSet is stored serialized in memory. Also most of the communication happens via serialization (instead of passing object references). The serialization overhead should have a significant overhead compared to a thread-loc

Re: Application logs missing from jobmanager log

2018-05-11 Thread Juho Autio
Thanks Fabian, here's the ticket: https://issues.apache.org/jira/browse/FLINK-9335 On Wed, May 2, 2018 at 12:53 PM, Fabian Hueske wrote: > Hi Juho, > > I assume that these logs are generated from a different process, i.e., the > client process and not the JM or TM process. > Hence, they end up i

Re: Storm topology running in flink.

2018-05-11 Thread Fabian Hueske
Hi, >From the dev perspective there hasn't been done much on that component since a long time [1]. Are there any users of this feature on the user list and can comment on how it works for them? Best, Fabian [1] https://github.com/apache/flink/commits/master/flink-contrib/flink-storm 2018-05-11

Re: Kafka Consumers Partition Discovery doesn't work

2018-05-11 Thread Juho Autio
Thanks Gordon, here's the ticket: https://issues.apache.org/jira/browse/FLINK-9334 If you'd like me to have a stab at it, feel free to assign the ticket to me. On Thu, Apr 12, 2018 at 10:28 PM, Tzu-Li (Gordon) Tai wrote: > Hi Juno, > > Thanks for reporting back, glad to know that it's not an is

Re: Iterative Stream won't loop

2018-05-11 Thread Henrique Colao Zanuz
Hi, thank you for your reply Actually, I'm doing something very similar to your code. The problem I'm having is that this structure is not generating any loop. For instance, If I print *labelsVerticesGroup*, I only see the initial set of tuples, the one from *updated**LabelsVerticesGroup* (at the

Re: Storm topology running in flink.

2018-05-11 Thread m@xi
Hello there, I have storm code and I need to run it. If possible I would like to run it with Flink. Is this possible and this feature stable now? Best, Max -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Wiring batch and stream together

2018-05-11 Thread Fabian Hueske
Hi Peter, Building the state for a DataStream job in a DataSet (batch) job is currently not possible. You can however, implement a DataStream job that reads batch data and builds the state. When all data was processed, you'd need to save the state as a savepoint and can resume a streaming job fro