Re: reading from latest kafka offset when flink starts

2016-05-09 Thread Balaji Rajagopalan
Robert, Regarding the event qps 4500 events/sec may not be large no, but I am seeing some issue in processing the events due to processing power that I am using, I have deployed flink app on 3 node yarn cluster one node is a master, 2 slave nodes which has the taskmanager running. Each machine is

Cassandra sink wrt Counters

2016-05-09 Thread milind parikh
Given FLINK 3311 & 3332, I am wondering it would be possible, without idempotent counters in Cassandra, to deliver on an exactly once sink into Cassandra. I do note that the verbiage/disc2 in 3332 does warn the user that this is not exactly "exactly once" sink. However my question has to do with

Looking up values in a metadata store for "condensed" events

2016-05-09 Thread milind parikh
I am new to flink. I am wondering what the best practice is for looking up additional values for entities embedded in an event. For example, an event might have the engine model #. A metadata store has the typical rpm characteristics of an engine model #; which can be used in a subsequent processi

Re: writing tests for my program

2016-05-09 Thread Igor Berman
answering my own question: testing streaming environment should be done with StreamingProgramTestBase & TestStreamEnvironment which are present in test package of flink-streaming-java project so it's not directly available? Project owners, why not to move above two to flink-test-utils? Or I don't

Local Cluster have problem with connect to elasticsearch

2016-05-09 Thread rafal green
Dear Sir or Madam, Can you tell me why I have a problem with elasticsearch in local cluster? I analysed this example: https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/connectors/elasticsearch2.html My flink and elasticsearch config are default (only I change node.name to "no

Re: How to measure Flink performance

2016-05-09 Thread prateek arora
Hi Thanks for the answer , then how can i measure the performance of flink ? i want to run my application with both spark and flink . and want to measure the performance . so i can check how fast flink process my data as compare to spark. Regards prateek On Mon, May 9, 2016 at 2:17 AM, Ufuk Cele

Re: Regarding source Parallelism and usage of coflatmap transformation

2016-05-09 Thread Biplob Biswas
Hi, Regarding 1) Thanks a lot for the ParallelSourceFunction, i completely missed that I was using a SourceFunction instead. Regarding 2) the example works and i can see what is happening there, now when i increase the parallelism i understand the corresponding change as to how the data is fed ba

writing tests for my program

2016-05-09 Thread Igor Berman
Any idea how to handle following(the message is clear, but I'm not sure what I need to do) I'm opening "generic" environment in my code (StreamExecutionEnvironment.getExecutionEnvironment()) and JavaProgramTestBase configures TestEnvironment... so what I should do to support custom tests? Erro

Re: Using ML lib SVM with Java

2016-05-09 Thread Theodore Vasiloudis
Hello Malte, As Simone said there is no Java support currently for FlinkML unfortunately. Regards, Theodore On Mon, May 9, 2016 at 3:05 PM, Simone Robutti wrote: > To my knowledge FlinkML does not support an unified API and most things > must be used exclusively with Scala Datasets. > > 2016-0

Run jar job in local cluster

2016-05-09 Thread rafal green
Dear Sir or Madam, Can you tell me why I have a problem with elasticsearch in local cluster? I analysed this example: https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/connectors/elasticsearch2.html My flink and elasticsearch config are default (only I change node.name to "no

Re: Streaming job software update

2016-05-09 Thread Aljoscha Krettek
Hi Bruce, you're right, taking down the job and restarting (from a savepoint) with the updated software is the only way of doing it. I'm not aware of any work being done in this area right now but it is an important topic that we certainly have to tackle in the not-so-far future. Cheer, Aljoscha

Re: Sorted output

2016-05-09 Thread Piyush Shrivastava
This is now solved, thank you. :-)  Thanks and Regards,Piyush Shrivastava http://webograffiti.com On Monday, 9 May 2016 3:47 PM, Piyush Shrivastava wrote: Hello all, I have a time series based logic written with Flink. Due to the parallelism, I am not getting the output in a proper se

Re: Using ML lib SVM with Java

2016-05-09 Thread Simone Robutti
To my knowledge FlinkML does not support an unified API and most things must be used exclusively with Scala Datasets. 2016-05-09 13:31 GMT+02:00 Malte Schwarzer : > Hi folks, > > I tried to get the FlinkML SVM running - but it didn't really work. The > SVM.fit() method requires a DataSet paramete

Re: Creating a custom operator

2016-05-09 Thread Simone Robutti
>- You wrote you'd like to "instantiate a H2O's node in every task manager". This reads a bit like you want to start H2O in the TM's JVM , but I would assume that a H2O node runs as a separate process. So should it be started inside the TM JVM or as an external process next to each TM. Also, do you

Using ML lib SVM with Java

2016-05-09 Thread Malte Schwarzer
Hi folks, I tried to get the FlinkML SVM running - but it didn't really work. The SVM.fit() method requires a DataSet parameter but there is no such class/interface in Flink Java. Or am I mixing something up with Scala? Also, I couldn't find a Flink ML example for Java (there is only Scala). Is

Re: Creating a custom operator

2016-05-09 Thread Fabian Hueske
Hi Simone, sorry for the delayed answer. I have a few questions regarding your requirements and a some ideas that might be helpful (depending on the requirements). 1) Starting / stopping of H2O nodes from Flink - You wrote you'd like to "instantiate a H2O's node in every task manager". This reads

Sorted output

2016-05-09 Thread Piyush Shrivastava
Hello all, I have a time series based logic written with Flink. Due to the parallelism, I am not getting the output in a proper series.For example, 3> (12:00:00, "value") appears before 1> (11:59:00, "value") while the timestamp of the latter is smaller than the former. I am using TimeWindow and

Re: How to choose the 'parallelism.default' value

2016-05-09 Thread Punit Naik
Perfect👍 On Mon, May 9, 2016 at 3:12 PM, Ufuk Celebi wrote: > Yes, I did just that and I used the relevant Flink terminology instead > of #cores and #machines: > > #cores => #slots per TM > #machines => #TMs > > On Mon, May 9, 2016 at 11:33 AM, Punit Naik > wrote: > > Yeah, thanks a lot for tha

Re: Writing Intermediates to disk

2016-05-09 Thread Vikram Saxena
I do not know if I understand completely, but I would create a new DataSet based on filtering the condition and then persist this DataSet. So : DataSet ds2 = DataSet1.filter(Condition) 2ds.output(...) On Mon, May 9, 2016 at 11:09 AM, Ufuk Celebi wrote: > Flink has support for spillable in

Re: Regarding source Parallelism and usage of coflatmap transformation

2016-05-09 Thread Aljoscha Krettek
Hi, regarding 1) the source needs to implement the ParallelSourceFunction or RichParallelSourceFunction interface to allow it to have a higher parallelism than 1. regarding 2) I wrote a small example that showcases how to do it: final StreamExecutionEnvironment env = StreamExecutionEnvironment.g

Re: Problem in creating quickstart project using archetype (Scala)

2016-05-09 Thread Ufuk Celebi
Hey Nirmalya, this is not your fault, but a shortcoming of the docs. ;) I've created an issue for it here: https://issues.apache.org/jira/browse/FLINK-3884 On Mon, May 2, 2016 at 8:01 PM, nsengupta wrote: > Fantastic! Many thanks for clarifying, Aljoscha! > > I blindly followed what that page sa

Re: Configuring taskmanager with HA jobmanager

2016-05-09 Thread Ufuk Celebi
Hey John! It looks like the task managers are not picking up the correct configuration. Can you please verify that all nodes (JobManager and TaskManager) use the same configuration. The task managers use ZooKeeper to look up the JobManager and not the configuration. >From the docs >(https://ci

Re: How to choose the 'parallelism.default' value

2016-05-09 Thread Ufuk Celebi
Yes, I did just that and I used the relevant Flink terminology instead of #cores and #machines: #cores => #slots per TM #machines => #TMs On Mon, May 9, 2016 at 11:33 AM, Punit Naik wrote: > Yeah, thanks a lot for that. Also if you could, please write the formula, > #cores\^2\^ * #machines * 4,

Re: Diff between stop and cancel job

2016-05-09 Thread Ufuk Celebi
On Thu, May 5, 2016 at 1:59 AM, Bajaj, Abhinav wrote: > Or can we resume a stopped streaming job ? You can use savepoints [1] to take a snapshot of a streaming program from which you can restart the job at a later point in time. This is independent of whether you cancel or stop the program afte

Re: How to choose the 'parallelism.default' value

2016-05-09 Thread Punit Naik
Yeah, thanks a lot for that. Also if you could, please write the formula, *#cores\^2\^* * *#machines* * 4, in a different form so that its more readable and understandable. On 09-May-2016 2:54 PM, "Ufuk Celebi" wrote: > On Mon, May 9, 2016 at 11:05 AM, Punit Naik > wrote: > > Thanks for the deta

Re: How to choose the 'parallelism.default' value

2016-05-09 Thread Ufuk Celebi
On Mon, May 9, 2016 at 11:05 AM, Punit Naik wrote: > Thanks for the detailed answer. I will definitely try this and get back to > you. OK, looking forward to it. ;) In the meantime I've updated the docs with a more concise version of what do to when you see this exception.

Re: How to measure Flink performance

2016-05-09 Thread Ufuk Celebi
Hey Prateek, On Fri, May 6, 2016 at 6:40 PM, prateekarora wrote: > I have below information from spark . do i can get similar information from > Flink also ? if yes then how can i get that. You can get GC time via the task manager overview. The other metrics don't necessarily translate to Flink

Re: Writing Intermediates to disk

2016-05-09 Thread Ufuk Celebi
Flink has support for spillable intermediate results. Currently they are only set if necessary to avoid pipeline deadlocks. You can force this via env.getConfig().setExecutionMode(ExecutionMode.BATCH); This will write shuffles to disk, but you don't get the fine-grained control you probably need

Key factors for Flink's performance

2016-05-09 Thread leon_mclare
Hello Flink team, i am currently playing around with Storm and Flink in the context of a smart home. The primary functional requirement is to quickly react to certain properties in stream tuples. I was looking at some benchmarks from the two systems, and generally Flink has the upper hand, in bo

Re: How to choose the 'parallelism.default' value

2016-05-09 Thread Punit Naik
Hi Ufuk Thanks for the detailed answer. I will definitely try this and get back to you. On 09-May-2016 2:08 PM, "Ufuk Celebi" wrote: > Hey Punit, > > you need to give the task managers more network buffers as Robert > suggested. Using the formula from the docs, can you please use 147456 > (96^2*

Re: assigning stream element to multiple windows of different types

2016-05-09 Thread Aljoscha Krettek
Hi, I think it should (theoretically) work. You would have to provide a custom serializer that can serialize/deserialize your different window subclasses. Also, you will probably have to provide a Trigger that can deal with the different types of windows. Cheers, Aljoscha On Mon, 9 May 2016 at 05

Re: How to choose the 'parallelism.default' value

2016-05-09 Thread Ufuk Celebi
Hey Punit, you need to give the task managers more network buffers as Robert suggested. Using the formula from the docs, can you please use 147456 (96^2*4*4) for the number of network buffers. Each buffer is 32 KB, meaning that you give 4,5 GB of memory to the network stack. You might have to adju

Re: reading from latest kafka offset when flink starts

2016-05-09 Thread Ufuk Celebi
Robert, what do you think about adding a note about this to the Kafka consumer docs? This has come up a couple of times on the mailing list already. – Ufuk On Fri, May 6, 2016 at 12:07 PM, Balaji Rajagopalan wrote: > Thanks Robert appreciate your help. > > On Fri, May 6, 2016 at 3:07 PM, Robert

Re: Restart Flink in Yarn

2016-05-09 Thread Ufuk Celebi
Hey Dominique! Are you running the job in HA mode? – Ufuk On Thu, May 5, 2016 at 1:49 PM, Robert Metzger wrote: > Hi Dominic, > I'm sorry that you ran into this issue. > What do you mean by "flink streaming routes" ? > > Regarding the second question: "Now I want to restart these routes to > co