Re: Scatter-Gather Iteration aggregators

2016-05-13 Thread Vasiliki Kalavri
Hi Lydia, an iteration aggregator combines all aggregates globally once per superstep and makes them available in the *next* superstep. Within each scatter-gather iteration, one MessagingFunction (scatter phase) and one VertexUpdateFunction (gather phase) are executed. Thus, if you set an aggregat

Re: Confusion about multiple use of one ValueState

2016-05-13 Thread Balaji Rajagopalan
Even thought there are multiple instance of map object transient value object state is accessible across the object, so as the stream is flowing in the value can be updated based on application logic. On Fri, May 13, 2016 at 11:26 AM, Balaji Rajagopalan < balaji.rajagopa...@olacabs.com> wrote: >

Re: Confusion about multiple use of one ValueState

2016-05-13 Thread Nirmalya Sengupta
Hello Balaji , Thanks for your reply. This confirms my earlier assumption that one of usual ways to do it is to hold and nurture the application-state in an external body; in your case: Redis. So, I am trying to understand how does one share the handle to this external body amongst partitions: do

Re: Scatter-Gather Iteration aggregators

2016-05-13 Thread Lydia Ickler
Hi Vasia, okay I understand now :) So it works fine if I want to collect the sum of values. But what if I need to reset the DoubleSumAggregator back to 0 in order to then set it to a new value to save the absolute maximum? Please have a look at the code above. Any idea why it is not working?

Re: checkpoints not being removed from HDFS

2016-05-13 Thread Maciek Próchniak
Hi Ufuk, It seems I messed it up a bit :) I cannot comment on jira, since it's temporarily locked... 1. org.apache.hadoop.fs.PathIsNotEmptyDirectoryException: `/flink/checkpoints_test/570d6e67d571c109daab468e5678402b/chk-62 is non empty': Directory is not empty - this seems to be expected behavi

Flink performance tuning

2016-05-13 Thread Serhiy Boychenko
Hey, I have successfully integrated Flink into our very small test cluster (3 machines with 8 cores, 8GBytes of memory and 2x1TB disks). Basically I am started the session to use YARN as RM and the data is being read from HDFS. /yarn-session.sh -n 21 -s 1 -jm 1024 -tm 1024 My code is very simpl

Re: Confusion about multiple use of one ValueState

2016-05-13 Thread Balaji Rajagopalan
I wrote a simple helper class, the redis connection are initialized in the constructor and there are set and get methods to store and retreive values from your map functions. If you find any better way to do this please share :). I am using redis scala client. object class RedisHelper { val re

"Memory ran out" error when running connected components

2016-05-13 Thread Arkay
Hi to all, I’m aware there are a few threads on this, but I haven’t been able to solve an issue I am seeing and hoped someone can help. I’m trying to run the following: val connectedNetwork = new org.apache.flink.api.scala.DataSet[Vertex[Long, Long]]( Graph.fromTuple2DataSet(inputEdges, vertex

killing process in Flink cluster

2016-05-13 Thread Ramkumar
Hi All, I am new to Flink. I am running wordcount streaming program in cluster. It has take more time. So I stopped that process manually. But it still in canceling, there are two subtasks in cluster one has successfully canceled but another one is still canceling. We tried to kill the process in

Re: Confusion about multiple use of one ValueState

2016-05-13 Thread Nirmalya Sengupta
Hello Balaji Yes. The State holder 'sum' in my example is actually created outside the Mapper objects; so it stays where it is. I am creating 'var's inside the Mapper objects to _refer_ to the same object, irrespective of multiplicity of the Mappers. The _open_ function is helping to make that a

Re: Confusion about multiple use of one ValueState

2016-05-13 Thread nsengupta
Sorry, Balaji! Somehow, I missed this particular post of yours. Please ignore my last mail, where I am asking the same question. -- Nirmalya -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Confusion-about-multiple-use-of-one-ValueState-tp68

Re: Scatter-Gather Iteration aggregators

2016-05-13 Thread Vasiliki Kalavri
Hi Lydia, aggregators are automatically reset at the beginning of each iteration. As far as I know, the reset() method is not supposed to be called from user code. Also, in the code you pasted, you use "aggregator.getAggregate()". Please, use the "getPreviousIterationAggregate()" method as I wrote

Re: Barriers at work

2016-05-13 Thread Matthias J. Sax
I don't think barries can "expire" as of now. Might be a nice idea thought -- I don't know if this might be a problem in production. Furthermore, I want to point out, that an "expiring checkpoint" would not break exactly-once processing, as the latest successful checkpoint can always be used to re

Re: "Memory ran out" error when running connected components

2016-05-13 Thread Vasiliki Kalavri
Hi Rob, On 13 May 2016 at 11:22, Arkay wrote: > Hi to all, > > I’m aware there are a few threads on this, but I haven’t been able to solve > an issue I am seeing and hoped someone can help. I’m trying to run the > following: > > val connectedNetwork = new org.apache.flink.api.scala.DataSet[Ver

Re: "Memory ran out" error when running connected components

2016-05-13 Thread Arkay
Thanks Vasia, Apologies, yes by workers i mean I have set taskmanager.numberOfTaskSlots: 8 and parallelism.default: 8 in flink-conf.yaml. I have also set taskmanager.heap.mb: 6500 In the dashboard it is showing free memory as 5.64GB and Flink Managed Memory as 3.90GB. Thanks, Rob -- View this

Re: Barriers at work

2016-05-13 Thread Stephan Ewen
Hi Srikanth! That is an interesting idea. I have it on my mind to create a design doc for checkpointing improvements. That could be added as a proposal there. I hope I'll be able to start with that design doc next week. Greetings, Stephan On Fri, May 13, 2016 at 1:35 PM, Matthias J. Sax wrote

flink-kafka-connector offset management

2016-05-13 Thread Arun Balan
Hi, I am trying to use the flink-kafka-connector and I notice that every time I restart my application it re-reads the last message on the kafka topic. So if the latest offset on the topic is 10, then when the application is restarted, kafka-connector will re-read message 10. Why is this the behavi

Re: "Memory ran out" error when running connected components

2016-05-13 Thread Vasiliki Kalavri
Thanks for checking Rob! I don't see any reason for the job to fail with this configuration and input size. I have no experience running Flink on windows though, so I might be missing something. Do you get a similar error with smaller inputs? -Vasia. On 13 May 2016 at 13:27, Arkay wrote: > Than

Re: "Memory ran out" error when running connected components

2016-05-13 Thread Arkay
Hi Vasia, It seems to work OK up to about 50MB of input, and dies after that point. If i disable just this connected components step the rest of my program is happy with the full 1.5GB test dataset. It seems to be specifically limited to GraphAlgorithms in my case. Do you know what the units ar

Re: Flink performance tuning

2016-05-13 Thread Robert Metzger
Hi, Can you try running the job with 8 slots, 7 GB (maybe you need to go down to 6 GB) and only three TaskManagers (-n 3) ? I'm suggesting this, because you have many small JVMs running on your machines. On such small machines you can probably get much more use out of your available memory by run

Re: Flink performance tuning

2016-05-13 Thread Stephan Ewen
One issue may be that the selection of YARN containers is not HDFS locality aware here. Hence, Flink may read more splits remotely, where MR reads more splits locally. On Fri, May 13, 2016 at 3:25 PM, Robert Metzger wrote: > Hi, > > Can you try running the job with 8 slots, 7 GB (maybe you need

Re: "Memory ran out" error when running connected components

2016-05-13 Thread Vasiliki Kalavri
On 13 May 2016 at 14:28, Arkay wrote: > Hi Vasia, > > It seems to work OK up to about 50MB of input, and dies after that point. > If i disable just this connected components step the rest of my program is > happy with the full 1.5GB test dataset. It seems to be specifically > limited > to GraphA

Availability of OrderedKeyedDataStream

2016-05-13 Thread nsengupta
Hello Flinksters, Have you decided to do away with the 'OrderedKeyedDataStream' type altogether? I didn't find it in the API documents . It is mentioned and elaborated here

Re: "Memory ran out" error when running connected components

2016-05-13 Thread Arkay
Thanks for the link, I had experimented with those options, apart from taskmanager.memory.off-heap: true. Turns out that allows it to run through happily! I don't know if that is a peculiarity of a windows JVM, as I understand that setting is purely an efficiency improvement? For your first ques

Re: Barriers at work

2016-05-13 Thread Srikanth
Thanks Matthias & Stephan! Yes, if we choose to fail checkpoint on expiry, we can restore from previous checkpoint. Looking forward to read the new design proposal. Srikanth On Fri, May 13, 2016 at 8:09 AM, Stephan Ewen wrote: > Hi Srikanth! > > That is an interesting idea. > I have it on my

Flink recovery

2016-05-13 Thread Madhire, Naveen
Hi, We are trying to test the recovery mechanism of Flink with Kafka and HDFS sink during failures. I’ve killed the job after processing some messages and restarted the same job again. Some of the messages I am seeing are processed more than once and not following the exactly once semantics.

Re: Barriers at work

2016-05-13 Thread Srikanth
I have a follow up. Is there a recommendation of list of knobs that can be tuned if at least once guarantee while handling failure is good enough? For cases like alert generation, non idempotent sink, etc where the system can live with duplicates or has other mechanism to handle them. Srikanth On

Re: Barriers at work

2016-05-13 Thread Stephan Ewen
You can use the checkpoint mode to "at least once". That way, barriers never block. On Fri, May 13, 2016 at 6:05 PM, Srikanth wrote: > I have a follow up. Is there a recommendation of list of knobs that can be > tuned if at least once guarantee while handling failure is good enough? > For cases

Re: Barriers at work

2016-05-13 Thread Srikanth
Thanks. I didn't know we could set that. On Fri, May 13, 2016 at 12:44 PM, Stephan Ewen wrote: > You can use the checkpoint mode to "at least once". > That way, barriers never block. > > On Fri, May 13, 2016 at 6:05 PM, Srikanth wrote: > >> I have a follow up. Is there a recommendation of list

Re: How to measure Flink performance

2016-05-13 Thread Ken Krugler
Hi Dhruv, > On May 12, 2016, at 11:07pm, Dhruv Gohil wrote: > > Hi Prateek, > > > https://github.com/dataArtisans/yahoo-streaming-benchmark/blob/ma

Sharing State between Operators

2016-05-13 Thread nsengupta
Hello Flinksters Alright. So, I had a fruitful exchange of messages with Balaji earlier today, on this topic. I moved ahead with the understanding derived from the exchange (thanks, Balaji) at the time. But, now I am back because I think my approach is unclean, if not incorrect. There probably is

Re: Flink recovery

2016-05-13 Thread Madhire, Naveen
I checked the JIRA and looks like FLINK-2111 should address the issue which I am facing. I am canceling the job from dashboard. I am using kafka source and HDFS rolling sink. https://issues.apache.org/jira/browse/FLINK-2111 Is this JIRA part of Flink 1.0.0? Thanks, Naveen From: "Madhire, Ve

Re: Flink recovery

2016-05-13 Thread Fabian Hueske
Hi, Flink's exactly-once semantics do not mean that events are processed exactly-once but that events will contribute exactly-once to the state of an operator such as a counter. Roughly, the mechanism works as follows: - Flink peridically injects checkpoint markers into the data stream. This happe

Re: Flink recovery

2016-05-13 Thread Madhire, Naveen
Thank you Fabian. I am using HDFS rolling sink. This should support the exactly once output in case of failures, isn’t it? I am following the below documentation, https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/fault_tolerance.html#fault-tolerance-guarantees-of-data-sources

Multi-tenant, deploying flink cluster

2016-05-13 Thread Alexander Smirnov
Hi, source data, read from MQ, contains tenant Id. Is there a way to route messages from particular tenant to particular Flink node? Is it what can be configured? Thank you, Alex

Re: Flink recovery

2016-05-13 Thread Fabian Hueske
Hi Naveen, the RollingFileSink supports exactly-once output. So you should be good. Did you see events being emitted multiple times (should not happen with the RollingFileSink) or being processed multiple times within the Flink program (might happen as explained before)? Best, Fabian 2016-05-13

Re: Flink recovery

2016-05-13 Thread Madhire, Naveen
Thanks Fabian. Yes, I am seeing few records more than once in the output. I am running the job and canceling it from the dashboard, and running again. And using different HDFS file outputs both the times. I was thinking when I cancel the job, it’s not doing a clean cancel. Is there anything else