Re: add FLINK_LIB_DIR to classpath on yarn -> add properties file to class path on yarn

2016-10-24 Thread vinay patil
Hi Max, As discussed here , I have put my yaml file in the flink lib directory, but still I am not able to get this file from classpath. I am using Flink 1.1.1 and cfg4j to load the file from classpath. Running the job on YARN in EMR using the below command: ./bin/flink run Can you please let

Re: About stateful transformations

2016-10-24 Thread Juan Rodríguez Hortalá
Hi Gyula, Thanks a lot for your response, it was very clear. I understand that there is no problem of small files due to checkpointing not being incremental. I also understand that each worker will interpret a file:// URL as local to its own file system, which works ok if all workers have a remove

Elasticsearch Http Connector

2016-10-24 Thread Madhukar Thota
Friends Any one using new Elasticsearch RestClient( https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/index.html) with Flink?

Re: Trigger evaluate

2016-10-24 Thread Fabian Hueske
No, this is correct. It describes how a Trigger is called, i.e., Flink calls for each element that is inserted into a window the Trigger.onElement() method. The default trigger of a TimeWindow does not fire on new elements. However, a custom trigger might fire when onElement() is called. 2016-10-2

Re: Trigger evaluate

2016-10-24 Thread Alberto Ramón
this is a mistake ? *"The trigger is called for each element that is inserted into the window and when a previously registered timer times out"* Thanks !! 2016

Re: Trigger evaluate

2016-10-24 Thread Fabian Hueske
The window is evaluated when a watermark arrives that is behind the window's end time. For instance, give the window in your example there are windows that end at 1:00:00, 1:00:30, 1:01:00, 1:01:30, ... (every 30 seconds). given the windows above, the window from 00:59:00 to 1:00:00 will be evalua

Re: Trigger evaluate

2016-10-24 Thread Alberto Ramón
I mean about *default Trigge*r, when you only put this: .timeWindow(Time.minutes(1), Time.seconds(30)) .sum(1) When data window is evaluated ? this is related?

Re: Event time, watermarks and windows

2016-10-24 Thread Aljoscha Krettek
Hi, the call to setAutoWatermarkInterval() is still necessary to activate the watermark-generation mechanism. However, calling setStreamTimeCharacteristic(EventTime) will also set a good default value for the auto watermark interval. Cheers, Aljoscha On Mon, 24 Oct 2016 at 17:02 Paul Joireman wr

Re: Trigger evaluate

2016-10-24 Thread Aljoscha Krettek
Hi, this depends on the Trigger you're using. For example, EventTimeTrigger will trigger when the watermark passes the end of a window. Cheers, Aljoscha On Mon, 24 Oct 2016 at 17:10 Alberto Ramón wrote: > Hello, 1 doubt: > > By default, when Trigger is launch to evaluate data of window ? > - N

Re: watermark trigger doesn't check whether element's timestamp is passed

2016-10-24 Thread Aljoscha Krettek
Hi, with some additional information we might be able to figure this out together. What specific combination of WindowAssigner/Trigger are you using for your example and what is the input stream (including watermarks)? Cheers, Aljoscha On Mon, 24 Oct 2016 at 06:30 Manu Zhang wrote: > Hi, > > Sa

Re: FlinkKafkaConsumerBase - Received confirmation for unknown checkpoint

2016-10-24 Thread Aljoscha Krettek
@Robert, do you have any idea what might be going on here? On Fri, 21 Oct 2016 at 16:50 PedroMrChaves wrote: > Hello, > > Am getting the following warning upon executing a checkpoint > > /2016-10-21 16:31:54,229 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering >

Re: question for generic streams

2016-10-24 Thread Aljoscha Krettek
Hi, I think the basic problem here is that the generic types cannot really be changed when executing, i.e. when you read the types of the stream from a file. One thing I could suggest is to use Strings for everything and inside those Strings use JSON or something similar to encode different types.

Checkpointing large RocksDB state to S3 - tips?

2016-10-24 Thread Josh
Hi all, I'm running Flink on EMR/YARN with 2x m3.xlarge instances and am checkpointing a fairly large RocksDB state to S3. I've found that when the state size hits 10GB, the checkpoint takes around 6 minutes, according to the Flink dashboard. Originally my checkpoint interval was 5 minutes for th

missing data in window.reduce() while apply() is ok

2016-10-24 Thread Sendoh
Hi Flink users, I saw a strange behavior that data are missing in reduce() but apply() doesn't, and when using 1.0.3 we don't see this behavior, and we see this in 1.1.3. Losing data means we don't see any event in the keys assigned, not the count of events. The code is as follows. DataStream> s

Trigger evaluate

2016-10-24 Thread Alberto Ramón
Hello, 1 doubt: By default, when Trigger is launch to evaluate data of window ? - New element in window? - When a watermark arrive? - When the window is moved? Thanks , Alb

Event time, watermarks and windows

2016-10-24 Thread Paul Joireman
Hi all, The event timestamp and watermarks documentation (v. 1.1) https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/streaming/event_timestamps_watermarks.html states that The AssignerWithPeriodicWatermarks assigns timestamps and generates watermarks periodically (possibly

Watermarks and window firing

2016-10-24 Thread Paul Joireman
Hi all, The documentation for event timestamps and watermarks (https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/streaming/event_timestamps_watermarks.html) states that the The AssignerWithPeriodicWatermarks assigns timestamps and generates watermarks periodically (possibly depe

Re: multiple processing of streams

2016-10-24 Thread Fabian Hueske
Hi Robert, Unfortunately, FoldFunctions can only be used for eager aggregation in Tumbling and SlidingWindows. For SessionWindows, only ReduceFunction can be used. The problem is that two SessionWindows might be combined in case a late event is received that "connects" them. In that case, the wind

Re: multiple processing of streams

2016-10-24 Thread robert.lancaster
Hi Fabian, Ah, that is good stuff, thanks. I’ll have to evaluate, but I believe that everything I’m calculating can be done this way, though it looks like FoldFunction + WindowFunction is better suited for my use case. I may still need a custom trigger as well, though. Some sessions may never

Re: multiple processing of streams

2016-10-24 Thread Fabian Hueske
Hi Robert, thanks for the update. Regarding the SessionWindow. If you can implement your window logic as ReduceFunction + WindowFunction (see incremental window aggregation [1]), your window state will be independent of the number of elements in the window. If that is not possible, you might have

Re: multiple processing of streams

2016-10-24 Thread robert.lancaster
Hi Fabian, Thanks for the response. It turns out this was a red herring. I knew how many events I was sending through the process, and the output of each type of window aggregate was coming out to be about half of what I expected. It turns out, however, that I hadn’t realized that the job wa

Re: NoClassDefFoundError on cluster with httpclient 4.5.2

2016-10-24 Thread Till Rohrmann
Great to hear that you solved your problem :-) Cheers, Till On Fri, Oct 21, 2016 at 12:34 PM, Yassine MARZOUGUI < y.marzou...@mindlytix.com> wrote: > Hi Till, > > The httpclient jar is included in the job jar. Looking at a similar issue > FLINK-4587

Re: About stateful transformations

2016-10-24 Thread Gyula Fóra
Hi Juan, Let me try to answer some of your questions :) We have been running Flink Streaming at King for quite some time now with multiple jobs having several hundred gigabytes of KV state stored in RocksDB. I would say RocksDB state backend is definitely the best choice at the moment for large d