Re: Maintaining watermarks per key, instead of per operator instance

2016-11-21 Thread Stephan Epping
Hey Aljoscha, the first solution did not work out as expected. As when late elements arrive the first window is triggered again and would emit a new (accumulated) event, that would be counted twice (in time accumulation and late accumulation) in the second window.I could implement my own (disca

Cassandra Connector

2016-11-21 Thread Stephan Epping
Hello, I wondered why the cassandra connector has such an unusual interface: CassandraSink csink = CassandraSink.addSink(readings) while all other sinks seem to look like RMQSink sink = new RMQSink(cfg, "readings_persist_out", new JSONReadingSchema()); readings.addSink(sink); best, Stephan

Re: Early events

2016-11-21 Thread Juan Rodríguez Hortalá
That makes sense, thanks for your answer. Greetings, Juan On Mon, Nov 21, 2016 at 9:11 AM, Aljoscha Krettek wrote: > Hi, > yes, Flink is expected to buffer those until the watermark catches up with > their timestamp. > > Cheers, > Aljoscha > > On Sun, 20 Nov 2016 at 06:18 Juan Rodríguez Hortal

Re: PathIsNotEmptyDirectoryException in Namenode HDFS log when using Jobmanager HA in YARN

2016-11-21 Thread static-max
Update: I deleted the /flink/recovery folder on HDFS and even then I get the same Exception after the next checkpoint. 2016-11-21 21:51 GMT+01:00 static-max : > Hi Stephan, > > it's not a problem, but makes finding other errors on my NameNode > complicated as I have this error message every minut

Re: PathIsNotEmptyDirectoryException in Namenode HDFS log when using Jobmanager HA in YARN

2016-11-21 Thread static-max
Hi Stephan, it's not a problem, but makes finding other errors on my NameNode complicated as I have this error message every minute. Can't we just delete the directory recursively? Regards, Max 2016-10-11 17:59 GMT+02:00 Stephan Ewen : > Hi! > > I think to some extend this is expected. There is

Re: ContinuousEventTimeTrigger breaks coGrouped windowed streams?

2016-11-21 Thread William Saar
Thanks! Yes, the SlidingEventTimeWindow works, but is there any way to pre-aggregate things with tumbling windows that emit events more often than their window size? Perhaps I can do this before I merge the streams? (But if ContinuousEventTimeTrigger is the only way to do that, it's bad if it does

Re: Flink streaming with 1+ TB of managed state

2016-11-21 Thread Steven Ruppert
Some responses inline below: On Sat, Nov 19, 2016 at 4:07 PM, Gyula Fóra mailto:gyula.f...@gmail.com>> wrote: Hi Steven, As Robert said some of our jobs have state sizes around a TB or more. We use the RocksDB state backend with some configs tuned to perform well on SSDs (you can get some tips

Re: flink-dist shading

2016-11-21 Thread Foster, Craig
Thanks for explaining, Robert and Gordon. For however it helps, I’ll comment on the original Maven issue which seems to be affecting many people. I am trying to create RPMs using BigTop and am still not quite sure how to handle this case. I guess what I could do is build the parent first in our

Re: Understanding connected streams use without timestamps

2016-11-21 Thread Aljoscha Krettek
Nope, not right now but this is pretty much what we're trying to solve with side inputs: https://docs.google.com/document/d/1hIgxi2Zchww_5fWUHLoYiXwSBXjv-M5eOv-MKQYN3m4/edit On Mon, 21 Nov 2016 at 16:11 Theodore Vasiloudis < theodoros.vasilou...@gmail.com> wrote: > Thanks for the clarification Gy

Re: Flink Material & Papers

2016-11-21 Thread Paris Carbone
+1 for the references Imho the most relevant scientific publication related to the current state-of-the-art of Flink is the first one cited by Dominik (IEEE Bulletin). So it makes sense to cite that one. However, Hanna, if you are also interested about prior work at TU Berlin that bootstrapped

Re: IterativeStream seems to ignore maxWaitTimeMillis

2016-11-21 Thread Aljoscha Krettek
Might it be that your initial source never stops? A loop will only terminate if both the original source stops and the loop timeout is reached. On Mon, 21 Nov 2016 at 07:58 Juan Rodríguez Hortalá < juan.rodriguez.hort...@gmail.com> wrote: > Hi, > > I wrote a proof of concept for a Java version of

Re: Early events

2016-11-21 Thread Aljoscha Krettek
Hi, yes, Flink is expected to buffer those until the watermark catches up with their timestamp. Cheers, Aljoscha On Sun, 20 Nov 2016 at 06:18 Juan Rodríguez Hortalá < juan.rodriguez.hort...@gmail.com> wrote: > Hi, > > There was a bug in my code, I was assigning the timestamps wrong and that > is

Re: Streaming program gets stucks when accesing to AWS Kinesis

2016-11-21 Thread Aljoscha Krettek
Hi, if you're outputting data using .print() then those will go into the stdout of the task managers. You should see this in the .out log of your Yarn machines. For running on Yarn I suggest to use a proper source that emits to some external system. Cheers, Aljoscha On Sat, 19 Nov 2016 at 10:14 I

Re: Flink Material & Papers

2016-11-21 Thread Dominik Safaric
Hi Hanna, I would certainly recommend if you haven’t so far to check the official docs of Flink at flink.apache.org. The documentation is comprehensive and understandable. From that point, I would recommend the following blog posts and academic papers: Apache Flink: Stream and Batch Proces

Re: Enforce an operation to run in an exact host in flink

2016-11-21 Thread Aljoscha Krettek
If I'm not mistaken Flink already has metrics that report latencies. These are not 100% correct because they machine one which an element originates is not necessarily the machine on which we measure the latency. I'm afraid that that's the best we can do right now, however. Other than running your

Re: Maintaining watermarks per key, instead of per operator instance

2016-11-21 Thread Aljoscha Krettek
Hi, why did you settle for the last solution? Cheers, Aljoscha On Thu, 17 Nov 2016 at 15:57 kaelumania wrote: > Hi Fabian, > > your proposed solution for: > > >1. Multiple window aggregations > > You can construct a data flow of cascading window operators and fork off > (to emit or further

Re: Error handling

2016-11-21 Thread Aljoscha Krettek
Hmm, I still don't know what could be causing this. Which version of Flink are you using? Also, when you say "BUT when the error is thrown inside invoke implementation from RichSinkFunction, the recovery is not done and also no further items are processed even if the kafka consumer is working fine.

Re: Understanding connected streams use without timestamps

2016-11-21 Thread Theodore Vasiloudis
Thanks for the clarification Gyula! In that case, is it possible currently to make one of the two connected streams stall until the other stream has produced at least one output before it starts producing as well? On Mon, Nov 21, 2016 at 3:16 PM, Gyula Fóra wrote: > Hi :) > > The execution of

Re: Flink streaming with 1+ TB of managed state

2016-11-21 Thread Stephan Ewen
Some background in the Incremental Checkpointing: It is not in the system, but we have a quite advanced design and some committers/contributors are currently starting the effort. My personal estimate is that it would be available in some months (Q1 next year). Best, Stephan On Sat, Nov 19, 2016

Re: Understanding connected streams use without timestamps

2016-11-21 Thread Gyula Fóra
Hi :) The execution of the Connected functions (map1/map2 in this case) are not affected by the timestamps. In other words it is pretty much arbitrary which input arrives at the CoMapFunction first. So I think you did everything correctly. Gyula Theodore Vasiloudis ezt írta (időpont: 2016. nov

Understanding connected streams use without timestamps

2016-11-21 Thread Theodore Vasiloudis
Hello all, I was playing around with the the IncrementalLearningSkeleton example and I had a couple of questions regarding the behavior of connected streams. In the example the elements are assigned timestamps, and there is a stream, model, that produces Double[] elements by ingesting and process

Re: ContinuousEventTimeTrigger breaks coGrouped windowed streams?

2016-11-21 Thread Gyula Fóra
Hi William, I am wondering whether the ContinuousEventTimeTrigger is the best choice here (it never cleans up the state as far as I know). Have you tried the simple SlidingEventTimeWindows as your window function? Cheers, Gyula William Saar ezt írta (időpont: 2016. nov. 19., Szo, 18:28): > Hi