Re: What is the best way to load/add patterns dynamically (at runtime) with Flink?

2016-11-03 Thread PedroMrChaves
Hi, Thank you for the response. Can you give me an example? I'm new to Flink and I still don't understand all the constructs. I also read this article https://techblog.king.com/rbea-scalable-real-time-analytics-king/. They use a similar approach, but am still not understanding how assign windows

Re: Release Process

2016-11-03 Thread Fabian Hueske
Hi Dominik, the discussion about the 1.2 release was started on the dev mailing list [1] about 2 weeks ago. So far the proposed timeline is have a release in mid December. Best, Fabian [1] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Schedule-and-Scope-for-Flink-1-2-tp

Release Process

2016-11-03 Thread Dominik Bruhn
Hey everyone, about three month ago, I made a PR [1] to the flink github project containing a small change for the RabbitMQ source. This PR was merged and the code is in the master. But: This code never made it into a release. In JIRA [2], it is meant to be released with 1.2. How is the polic

Re: Flink Application on YARN failed on losing Job Manager | No recovery | Need help debug the cause from logs

2016-11-03 Thread Anchit Jatana
Hi Maximilian, Thanks for you response. Since, I'm running the application on YARN cluster using 'yarn-cluster' mode i.e. using 'flink run -m yarn-cluster ..' command. Is there anything more that I need to configure apart from setting up 'yarn.application-attempts: 10' property inside conf/flink-c

Flink Time Window State

2016-11-03 Thread Daniel Santos
Hello, I have some question that has been bugging me. Let's say we have a Kafka Source. Checkpoint is enabled, with a period of 5 seconds. We have a FSBackend ( Hadoop ). Now imagine we have a window a tumbling of 10 Minutes. For simplicity we are going to say that we are counting all elements

Re: ExpiredIteratorException when reading from a Kinesis stream

2016-11-03 Thread Stephan Ewen
Is it possible that you have stalls in your topology? Reasons could be: - The data sink blocks or becomes slow for some periods (where are you sending the data to?) - If you are using large state and a state backend that only supports synchronous checkpointing, there may be a delay introduce

Re: ExpiredIteratorException when reading from a Kinesis stream

2016-11-03 Thread Scott Kidder
Hi Steffan & Josh, For what it's worth, I've been using the Kinesis connector with very good results on Flink 1.1.2 and 1.1.3. I updated the Flink Kinesis connector KCL and AWS SDK dependencies to the following versions: aws.sdk.version: 1.11.34 aws.kinesis-kcl.version: 1.7.0 My customizations a

Re: Freeing resources in SourceFunction

2016-11-03 Thread Maximilian Michels
For your use case you should use the close() method which is always called upon shutdown of your source. The cancel() is only called when you explicitly cancel your job. -Max On Thu, Nov 3, 2016 at 2:45 PM, Yury Ruchin wrote: > Hello, > > I'm writing a custom source function for my streaming jo

Re: Flink Application on YARN failed on losing Job Manager | No recovery | Need help debug the cause from logs

2016-11-03 Thread Maximilian Michels
Hi Anchit, It is possible that the application crashes for many different reasons, e.g. error in user code, hardware/network failures. Have you configured high availability for Yarn as described in the documentation: https://ci.apache.org/projects/flink/flink-docs-release-1.1/setup/jobmanager_hig

Re: What is the best way to load/add patterns dynamically (at runtime) with Flink?

2016-11-03 Thread Aljoscha Krettek
Hi Pedro, you can have dynamic windows by assigning the windows to elements in your Processor (so you would need to extend that type to have a field for the window). Then, you can write a custom WindowAssigner that will simply get the window from an event and assign that as the internal window. Pl

Re: emit partial state in window (streaming)

2016-11-03 Thread Luis Mariano Guerra
On Thu, Nov 3, 2016 at 7:05 PM, Kostas Kloudas wrote: > Hi Luis, > > You cannot have event-time early firings on both chained window operators. > The reason is that each early result from the first window operator will > have a timestamp equal to window.maxTimestamp-1. > So in the second windowin

Re: Testing DataStreams

2016-11-03 Thread Maximilian Michels
Hi Juan, StreamingMultipleProgramsTestBase is in the testing scope. Thus, is it not bundled in the normal jars. You would have to add the flink-test-utils_2.10 module. It is true that there is no guide. There is https://github.com/ottogroup/flink-spector for testing streaming pipelines. For unit

Re: emit partial state in window (streaming)

2016-11-03 Thread Kostas Kloudas
Hi Luis, You cannot have event-time early firings on both chained window operators. The reason is that each early result from the first window operator will have a timestamp equal to window.maxTimestamp-1. So in the second windowing operator, they will be buffered until the watermark signaling

Re: BoundedOutOfOrdernessTimestampExtractor and timestamps in the future

2016-11-03 Thread Maximilian Michels
The BoundedOutOfOrdernessTimestampExtractor is not really useful if you have outliers because you always set the Watermark to the element with the largest timestamp minus the out-of-orderness. If your data is of such nature, you will have to implement a custom Watermark extractor to deal with these

Re: What is the best way to load/add patterns dynamically (at runtime) with Flink?

2016-11-03 Thread PedroMrChaves
Hello, Your tip was very helpful and I took a similar approach. I have something like this: class Processor extends RichCoFlatMapFunction { public void flatMap1(Event event, Collector out) { process(event,out); // run the javscript (rules) against the incoming events } publ

Re: Flink Kafka 0.10.0.0 connector

2016-11-03 Thread Dominik Safaric
Hi Robert, Thanks for sharing this insight. However, the Flink Kafka 010 connector is only compatible with the 1.2-SNAPSHOT. Despite that, I’ve managed to get the Flink Kafka 09 use the Kafka version 0.10.0.1 Only minor changes to the test code had to be made, mostly in regard to Zookeeper

Re: NotSerializableException: jdk.nashorn.api.scripting.NashornScriptEngine

2016-11-03 Thread Greg Hogan
Hi Pedro, Which problem are you having, the NotSerializableException or not seeing open() called on a RichFunction? Greg On Wed, Nov 2, 2016 at 10:47 AM, PedroMrChaves wrote: > Hello, > > I'm having the exact same problem. > I'm using a filter function on a datastream. > My flink version is 1.

Re: Best Practices/Advice - Execution of jobs

2016-11-03 Thread PedroMrChaves
Thank you. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Best-Practices-Advice-Execution-of-jobs-tp9822p9873.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Using Flink with Accumulo

2016-11-03 Thread Josh Elser
Hi Oliver, Cool stuff. I wish I knew more about Flink to make some better suggestions. Some points inline, and sorry in advance if I suggest something outright wrong. Hopefully someone from the Flink side can help give context where necessary :) Oliver Swoboda wrote: Hello, I'm using Flink

Re: Best Practices/Advice - Execution of jobs

2016-11-03 Thread Aljoscha Krettek
Hi Pedro, I think it would be better to have two jobs and keep all the rules in one place. If it's not too many sources you might even consider having everything in one job so you don't have to duplicate the rules. There's a tradeoff, though, if it becomes too much stuff then splitting up will be

Re: Why are externalized checkpoints deleted on Job Manager exit?

2016-11-03 Thread Ufuk Celebi
A fix is pending here: https://github.com/apache/flink/pull/2750 The behaviour on graceful shut down/suspension respects the cancellation behaviour with this change. On Thu, Nov 3, 2016 at 3:23 PM, Ufuk Celebi wrote: > I don't need the logs. Externalized checkpoints have been configured > to be

Re: emit partial state in window (streaming)

2016-11-03 Thread Luis Mariano Guerra
On Thu, Nov 3, 2016 at 2:06 PM, Kostas Kloudas wrote: > Hi Luis, > > Can you try to comment the whole final windowing and see if this is works? > This includes the following lines: > > .windowAll(TumblingEventTimeWindows.of(Time.of(windowTime, timeUnit))) > .trigger(new PartialWindowTrigger<>

Re: Why are externalized checkpoints deleted on Job Manager exit?

2016-11-03 Thread Ufuk Celebi
I don't need the logs. Externalized checkpoints have been configured to be deleted when the job is suspended, too. When the YARN session is terminated, all jobs are suspended. The behaviour seems like a bug. As a work around you have to cancel the job before you shut down the YARN session. Let me

Re: Best way of doing some global initialization

2016-11-03 Thread Satish Chandra Gupta
Hi Aljoscha, Thanks for quick reply. I didn't quite understand your suggestion. Say I have three Kafka Stream sources that my Flink program consumes. How can I modify those three sources to be Kafka source as well as consumer of this single element? Thanks, +satish On Thu, Nov 3, 2016 at 6:37 PM

Re: ExpiredIteratorException when reading from a Kinesis stream

2016-11-03 Thread Josh
Hi Gordon, Thanks for the fast reply! You're right about the expired iterator exception occurring just before each spike. I can't see any signs of long GC on the task managers... CPU has been <15% the whole time when the spikes were taking place and I can't see anything unusual in the task manager

Re: Why are externalized checkpoints deleted on Job Manager exit?

2016-11-03 Thread Ufuk Celebi
They should actually be not deleted. Could you please share the logs with me? In the mean time, I will try to reproduce this. On Thu, Nov 3, 2016 at 2:04 PM, Aljoscha Krettek wrote: > +Ufuk > > Ufuk recently worked on that, if I'm not mistaken. Do you have an Idea what > could be going on here?

Re: Reg. custom sink for Flink streaming

2016-11-03 Thread Fabian Hueske
Hi, a MapFunction should be the way to go for this use case. What exactly is not working? Do you get an exception? Is the map method not called? Best, Fabian 2016-11-03 0:00 GMT+01:00 Sandeep Vakacharla : > Hi there, > > > > I have the following use case- > > > > I have data coming from Kafka w

Freeing resources in SourceFunction

2016-11-03 Thread Yury Ruchin
Hello, I'm writing a custom source function for my streaming job. The source function manages some connection pool. I want to close that pool once my job is "finished" (since the stream is unbounded, the only way I see is to cancel the streaming job). Since I inherit RichSourceFunction, there are

Re: emit partial state in window (streaming)

2016-11-03 Thread Manu Zhang
Hi Luis, You may try ContinuousEventTimeTrigger that continuously fire on given time interval instead of writing your own. Note that w

Re: Best way of doing some global initialization

2016-11-03 Thread Aljoscha Krettek
Hi, I'm afraid this is not possible right now if you don't want to go with completely custom sources/operators. If you want to go the custom source route you would have only one true source in your job that does the global initialisation and then emits one element. Your other sources would be oper

Re: emit partial state in window (streaming)

2016-11-03 Thread Kostas Kloudas
Hi Luis, Can you try to comment the whole final windowing and see if this is works? This includes the following lines: .windowAll(TumblingEventTimeWindows.of(Time.of(windowTime, timeUnit))) .trigger(new PartialWindowTrigger<>(partialWindowTime, timeUnit, windowTime, timeUnit)) .apply(crea

Re: Why are externalized checkpoints deleted on Job Manager exit?

2016-11-03 Thread Aljoscha Krettek
+Ufuk Ufuk recently worked on that, if I'm not mistaken. Do you have an Idea what could be going on here? On Wed, 2 Nov 2016 at 21:52 Clifford Resnick wrote: > Testing externalized checkpoints in a YARN-based cluster, configured with: > > > env.getCheckpointConfig.enableExternalizedCheckpoints

Re: Accessing StateBackend snapshots outside of Flink

2016-11-03 Thread Aljoscha Krettek
Hi, there are two open issues about this: * https://issues.apache.org/jira/browse/FLINK-3946 * https://issues.apache.org/jira/browse/FLINK-3089 no work was done on this yet. You can, however, simulate TTL for state by using a TimelyFlatMapFunction and manually setting a timer for clearing out st

Re: Flink Kafka 0.10.0.0 connector

2016-11-03 Thread Robert Metzger
Hi, I just tried the Kafka 0.10 connector again, and I could not reproduce the issue you are reporting. This is my test job: // parse input arguments final ParameterTool parameterTool = ParameterTool.fromArgs(args); if(parameterTool.getNumberOfParameters() < 4) { System.out.println("Missing p

Using Flink with Accumulo

2016-11-03 Thread Oliver Swoboda
Hello, I'm using Flink with Accumulo and wanted to read data from the database by using the createHadoopInput function. Therefore I configure an AccumuloInputFormat. The source code you can find here: https://github.com/ OSwoboda/masterthesis/blob/master/aggregation.flink/src/ main/java/de/oswobod

Re: Flink Kafka 0.10.0.0 connector

2016-11-03 Thread Dominik Safaric
Hi Robert, > I think the easiest way to get Kafka 0.10 running with Flink is to use the > Kafka 0.10 connector in the current Flink master. Well, I’ve already builded the Kafka 0.10 connector from the master, but unfortunately I keep getting the error of the type checker that the type of the

Re: ExpiredIteratorException when reading from a Kinesis stream

2016-11-03 Thread Tzu-Li (Gordon) Tai
Hi Josh, That warning message was added as part of FLINK-4514. It pops out whenever a shard iterator was used after 5 minutes it was returned from Kinesis. The only time spent between after a shard iterator was returned and before it was used to fetch the next batch of records, is on deserializi

Re: Flink Kafka 0.10.0.0 connector

2016-11-03 Thread Robert Metzger
Hi Dominik, Some of Kafka's APIs changed between Kafka 0.9 and 0.10, so you can not compile the Kafka 0.9 against Kafka 0.10 dependencies. I think the easiest way to get Kafka 0.10 running with Flink is to use the Kafka 0.10 connector in the current Flink master. You can probably copy the connect

Re: ExpiredIteratorException when reading from a Kinesis stream

2016-11-03 Thread Josh
Hey Gordon, I've been using Flink 1.2-SNAPSHOT for the past week (with FLINK-4514) with no problems, but yesterday the Kinesis consumer started behaving strangely... My Kinesis data stream is fairly constant at around 1.5MB/sec, however the Flink Kinesis consumer started to stop consuming for peri

Re: emit partial state in window (streaming)

2016-11-03 Thread Luis Mariano Guerra
On Thu, Oct 27, 2016 at 4:37 PM, Fabian Hueske wrote: > Hi Luis, > > these blogposts should help you with the periodic partial result trigger > [1] [2]. > Hi, thanks for the links, I read them and tried to implement what I need, everything seems to work as expected except for the fact that the p

Re: Looping over a DataSet and accesing another DataSet

2016-11-03 Thread otherwise777
I just found out that I am able to use arrays in tuple values, nvm about that question -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Looping-over-a-DataSet-and-accesing-another-DataSet-tp9778p9850.html Sent from the Apache Flink User Mailin

Best way of doing some global initialization

2016-11-03 Thread Satish Chandra Gupta
Hi, I need to do set/initialize some config of a framework/util that is used in my Flink stream processing app. Basically, a piece of code that needs to be executed exactly once before anything else. Clearly doing it in the main flink processor function will not suffice, as apart from the client,

Flink Kafka 0.10.0.0 connector

2016-11-03 Thread Dominik Safaric
Dear all, Although the Flink 1.2 version will rollout a Flink Kafka 0.10.x connector, I want to use the Flink 0.9 connector in conjunction with the 0.10.x versions. The reason behind this is because we are currently evaluating Flink part of an empirical research, hence a stable release is requ