Re: Apache Flink 0.10.0 released

2015-11-16 Thread Welly Tambunan
Great Job guys, So this is the first production ready for Streaming API ! Cool ! Cheers On Mon, Nov 16, 2015 at 9:02 PM, Leonard Wolters wrote: > congrats! > > L. > > > On 16-11-15 14:53, Fabian Hueske wrote: > > Hi everybody, > > The Flink community is excited to announce that Apache Flink 0

Re: Apache Flink Operator State as Query Cache

2015-11-16 Thread Welly Tambunan
Hi Stephan, So that will be in Flink 1.0 right ? Cheers On Mon, Nov 16, 2015 at 9:06 PM, Stephan Ewen wrote: > Hi Anwar! > > 0.10.0 was feature frozen at that time already and under testing. > Key/value state on connected streams will have to go into the next > release... > > Stephan > > > On

Re: Different CoGroup behavior inside DeltaIteration

2015-11-16 Thread Duc Kien Truong
Hi, Thanks for the suggestion. I'm trying to use the delta iteration so that I can get the empty work set convergence criteria for free. But since doing an outer join between the work set and the solution set is not possible using cogroup, I will try to adapt my algorithm to use the bulk itera

Re: Different CoGroup behavior inside DeltaIteration

2015-11-16 Thread Stephan Ewen
It is actually very important that the co group in delta iterations works like that. If the CoGroup touched every element in the solution set, the "decreasing work" effect would not happen. The delta iterations are designed for cases where specific updates to the solution are made, driven by the w

Re: Different CoGroup behavior inside DeltaIteration

2015-11-16 Thread Fabian Hueske
Hi, this is an artifact of how the solution set is internally implemented. Usually, a CoGroup is executed using a sort-merge strategy, i.e., both input are sorted, merged, and handed to the CoGroup function in a streaming fashion. Both inputs are treated equally, and if one of both inputs does not

Re: Error handling

2015-11-16 Thread Nick Dimiduk
> > The errors outside your UDF (such as network problems) will be handled by > Flink and cause the job to go into recovery. They should be transparently > handled. Is that so? I've been able to feed bad data onto my kafka topic and cause the stream job to abort. You're saying this should not be

Re: Creating a representative streaming workload

2015-11-16 Thread Nick Dimiduk
All those should apply for streaming too... On Mon, Nov 16, 2015 at 11:06 AM, Vasiliki Kalavri < vasilikikala...@gmail.com> wrote: > Hi, > > thanks Nick and Ovidiu for the links! > > Just to clarify, we're not looking into creating a generic streaming > benchmark. We have quite limited time and r

Re: Creating a representative streaming workload

2015-11-16 Thread Vasiliki Kalavri
Hi, thanks Nick and Ovidiu for the links! Just to clarify, we're not looking into creating a generic streaming benchmark. We have quite limited time and resources for this project. What we want is to decide on a set of 3-4 _common_ streaming applications. To give you an idea, for the batch worklo

Re: Error handling

2015-11-16 Thread Stephan Ewen
Makes sense. The class of operations that work "per-tuple" before the data is forwarded to the network stack could be extended to have error traps. @Nick: Is that what you had in mind? On Mon, Nov 16, 2015 at 7:27 PM, Aljoscha Krettek wrote: > Hi, > I don’t think that alleviates the problem. So

Re: Error handling

2015-11-16 Thread Aljoscha Krettek
Hi, I don’t think that alleviates the problem. Sometimes you might want the system to continue even if stuff outside the UDF fails. For example, if a serializer does not work because of a null value somewhere. You would, however, like to get a message about this somewhere, I assume. Cheers, Alj

Re: Creating a representative streaming workload

2015-11-16 Thread Ovidiu-Cristian MARCU
Regarding Flink vs Spark / Storm you can check here: http://www.sparkbigdata.com/102-spark-blog-slim-baltagi/14-results-of-a-benchmark-between-apache-flink-and-apache-spark

Re: Error handling

2015-11-16 Thread Stephan Ewen
Hi Nick! The errors outside your UDF (such as network problems) will be handled by Flink and cause the job to go into recovery. They should be transparently handled. Just make sure you activate checkpointing for your job! Stephan On Mon, Nov 16, 2015 at 6:18 PM, Nick Dimiduk wrote: > I have

Re: MaxPermSize on yarn

2015-11-16 Thread Robert Metzger
Hi Gwen, there is a configuration value called "env.java.opts", that allows you to pass custom JVM args to JM and TM JVMs. I hope that helps. On Mon, Nov 16, 2015 at 5:30 PM, Gwenhael Pasquiers < gwenhael.pasqui...@ericsson.com> wrote: > Hi, > > > > We’re having some OOM permgen exceptions wh

Re: Error handling

2015-11-16 Thread Nick Dimiduk
> > I have been thinking about this, maybe we can add a special output stream > (for example Kafka, but can be generic) that would get errors/exceptions > that where throws during processing. The actual processing would not stop > and the messages in this special stream would contain some informati

Re: Custom TimestampExtractor and FlinkKafkaConsumer082

2015-11-16 Thread Konstantin Knauf
Hi Aljoscha, I changed the Timestamp Extraktor to save the lastSeenTimestamp and only emit with getCurrentWatermark [1] as you suggested. So basically I do the opposite than before (only watermarks per events vs only watermarks per autowatermark). And now it works :). The question remains, why it

Re: Creating a representative streaming workload

2015-11-16 Thread Nick Dimiduk
Why not use an existing benchmarking tool -- is there one? Perhaps you'd like to build something like YCSB [0] but for streaming workloads? Apache Storm is the OSS framework that's been around the longest. Search for "apache storm benchmark" and you'll get some promising hits. Looks like IBMStream

MaxPermSize on yarn

2015-11-16 Thread Gwenhael Pasquiers
Hi, We're having some OOM permgen exceptions when running on yarn. We're not yet sure if it is either a consequence or a cause of our crashes, but we've been trying to increase that value... And we did not find how to do it. I've seen that the yarn-daemon.sh sets a 256m value. It looks to me th

Re: Multilang Support on Flink

2015-11-16 Thread Maximilian Michels
Hi Welly, It's in the main Flink repository. Actually, this has just been integrated with the Python API, see https://github.com/apache/flink/blob/master/flink-libraries/flink-python/src/main/java/org/apache/flink/python/api/PythonPlanBinder.java Before it was independent https://github.com/apach

Creating a representative streaming workload

2015-11-16 Thread Vasiliki Kalavri
Hello squirrels, with some colleagues and students here at KTH, we have started 2 projects to evaluate (1) performance and (2) behavior in the presence of memory interference in cloud environments, for Flink and other systems. We want to provide our students with a workload of representative appli

Re: Apache Flink Operator State as Query Cache

2015-11-16 Thread Stephan Ewen
Hi Anwar! 0.10.0 was feature frozen at that time already and under testing. Key/value state on connected streams will have to go into the next release... Stephan On Mon, Nov 16, 2015 at 3:00 PM, Anwar Rizal wrote: > Stephan, > > Having a look at the brand new 0.10 release, I noticed that Oper

Re: Apache Flink 0.10.0 released

2015-11-16 Thread Leonard Wolters
congrats! L. On 16-11-15 14:53, Fabian Hueske wrote: Hi everybody, The Flink community is excited to announce that Apache Flink 0.10.0 has been released. Please find the release announcement here: --> http://flink.apache.org/news/2015/11/16/release-0.10.0.html Best, Fabian -- Leonard Wol

Re: Apache Flink Operator State as Query Cache

2015-11-16 Thread Anwar Rizal
Stephan, Having a look at the brand new 0.10 release, I noticed that OperatorState is not implemented for ConnectedStream, which is quite the opposite of what you said below. Or maybe I misunderstood your sentence here ? Thanks, Anwar. On Wed, Nov 11, 2015 at 10:49 AM, Stephan Ewen wrote: >

Re: Apache Flink 0.10.0 released

2015-11-16 Thread Slim Baltagi
Hi I’m very pleased to be first to tweet about the release of Apache Flink 0.10.0 just after receiving Fabian’s email :) Flink 1.0 is around the corner now! Slim Baltagi On Nov 16, 2015, at 7:53 AM, Fabian Hueske wrote: > Hi everybody, > > The Flink community is excited to announce that Apa

Apache Flink 0.10.0 released

2015-11-16 Thread Fabian Hueske
Hi everybody, The Flink community is excited to announce that Apache Flink 0.10.0 has been released. Please find the release announcement here: --> http://flink.apache.org/news/2015/11/16/release-0.10.0.html Best, Fabian

Re: Custom TimestampExtractor and FlinkKafkaConsumer082

2015-11-16 Thread Aljoscha Krettek
Hi, yes, at your data-rate emitting a watermark for every element should not be a problem. It could become a problem with higher data-rates since the system can get overwhelmed if every element also generates a watermark. In that case I would suggest storing the lastest element-timestamp in an i

Re: Custom TimestampExtractor and FlinkKafkaConsumer082

2015-11-16 Thread Konstantin Knauf
Hi Aljoscha, ok, now I at least understand, why it works with fromElements(...). For the rest I am not so sure. > What this means in your case is that the watermark can only advance if a new element arrives, because only then is the watermark updated. But new elements arrive all the time, about

Re: Custom TimestampExtractor and FlinkKafkaConsumer082

2015-11-16 Thread Aljoscha Krettek
Hi, it could be what Gyula mentioned. Let me first go a bit into how the TimestampExtractor works internally. First, the timestamp extractor internally keeps the value of the last emitted watermark. Then, the semantics of the TimestampExtractor are as follows : - the result of extractTimestamp

Re: Custom TimestampExtractor and FlinkKafkaConsumer082

2015-11-16 Thread Gyula Fóra
Could this part of the extractor be the problem Aljoscha? @Override public long getCurrentWatermark() { return Long.MIN_VALUE; } Gyula Konstantin Knauf ezt írta (időpont: 2015. nov. 16., H, 10:39): > Hi Aljoscha, > > thanks for your answer. Yes I am using the same TimestampExtr

Re: Custom TimestampExtractor and FlinkKafkaConsumer082

2015-11-16 Thread Konstantin Knauf
Hi Aljoscha, thanks for your answer. Yes I am using the same TimestampExtractor-Class. The timestamps look good to me. Here an example. {"time": 1447666537260, ...} And parsed: 2015-11-16T10:35:37.260+01:00 The order now is stream .map(dummyMapper) .assignTimestamps(...) .timeWindow(...) Is t

Re: Custom TimestampExtractor and FlinkKafkaConsumer082

2015-11-16 Thread Aljoscha Krettek
Hi, are you also using the timestamp extractor when you are using env.fromCollection(). Could you maybe insert a dummy mapper after the Kafka source that just prints the element and forwards it? To see if the elements come with a good timestamp from Kafka. Cheers, Aljoscha > On 15 Nov 2015, at

Re: Checkpoints in batch processing & JDBC Output Format

2015-11-16 Thread Maximilian Bode
Hi Stephan, thank you very much for your answer. I was happy to meet Robert in Munich last week and he proposed that for our problem, batch processing is the way to go. We also talked about how exactly to guarantee in this context that no data is lost even in the case the job dies while writing