Re: Flink: How to handle external app configuration changes in flink

2016-09-26 Thread Govindarajan Srinivasaraghavan
Hi Jamie, Thanks a lot for the response. Appreciate your help. Regards, Govind On Mon, Sep 26, 2016 at 3:26 AM, Jamie Grier wrote: > Hi Govindarajan, > > Typically the way people do this is to create a stream of configuration > changes and consume this like any other stream. For the specific

RE: TaskManager & task slots

2016-09-26 Thread Ramanan, Buvana (Nokia - US)
Hello Fabian, Thanks a lot for the explanation. When the operators (including source / sink) are chained, what is the method of communication between them? We use Kafka for data source and I was interested to learn the mechanism of communication from Kafka to the next operator, say flatMap… So

Re: How can I prove ....

2016-09-26 Thread amir bahmanyari
Thanks Stephan.I dont see a "graph" in JM's  "Dashboard" when I click on the running job...I see a box like below with Parallelism  = 512 which is what I have set as the parallelism degree in my code:options.setParallelism(512); Does this mean the cluster is now fully running on its max capacity

Re: TaskManager & task slots

2016-09-26 Thread Fabian Hueske
Hi Buvana, A TaskManager runs as a single JVM process. A TaskManager provides a certain number of processing slots. Slots do not guard CPU time, IO, or JVM memory. At the moment they only isolate managed memory which is only used for batch processing. For streaming applications their only purpose

TaskManager & task slots

2016-09-26 Thread Ramanan, Buvana (Nokia - US)
Hello, I would like to understand the following better: https://ci.apache.org/projects/flink/flink-docs-release-1.1/setup/config.html#configuring-taskmanager-processing-slots Fundamental question - what is the notion of Task Slot? Does it correspond to one JVM? Or the Task Manager itself corres

Re: Complex batch workflow needs (too) much time to create executionPlan

2016-09-26 Thread Fabian Hueske
Hi Markus, thanks for the stacktraces! The client is indeed stuck in the optimizer. I have to look a bit more into this. Did you try to set JoinHints in your plan? That should reduce the plan space that is enumerated and therefore reduce the optimization time (maybe enough to run your application

Re: Flink Checkpoint runs slow for low load stream

2016-09-26 Thread vinay patil
I am not sure about that, I will run the pipeline on cluster and share the details Since window is a stateful operator , it will store only the key part in the state backend and not the value , right ? Regards, Vinay Patil On Mon, Sep 26, 2016 at 12:13 PM, Stephan Ewen [via Apache Flink User Mail

Re: Flink Checkpoint runs slow for low load stream

2016-09-26 Thread Stephan Ewen
@vinay - Is it in your case large state that causes slower checkpoints? On Mon, Sep 26, 2016 at 6:17 PM, vinay patil wrote: > Hi, > > I am also facing this issue, in my case the data is flowing continuously > from the Kafka source, when I increase the checkpoint interval to 6, > the data get

Re: Flink Checkpoint runs slow for low load stream

2016-09-26 Thread Stephan Ewen
Thanks, the logs were very helpful! TL:DR - The offset committing to ZooKeeper is very slow and prevents proper starting of checkpoints. Here is what is happening in detail: - Between the point when the TaskManager receives the "trigger checkpoint" message and when the point when the KafkaSour

Re: Flink Checkpoint runs slow for low load stream

2016-09-26 Thread vinay patil
Hi, I am also facing this issue, in my case the data is flowing continuously from the Kafka source, when I increase the checkpoint interval to 6, the data gets written to S3 sink. Is it because some operator is taking more time for processing, like in my case I am using a time window of 1sec.

Best way to trigger dataset sampling

2016-09-26 Thread Flavio Pompermaier
Hi to all, I have a use case where I need to tell a Flink cluster to give me a sample of X records using parametrizable sampling functions. Is there any best practice or advice to do that? Should I create a Remote ExecutionEnvironment or should I use the Flink client (I don't know if it uses REST

Re: Complex batch workflow needs (too) much time to create executionPlan

2016-09-26 Thread Markus Nentwig
Hi Fabian, at first, sorry for the late answer. The given execution plan was created after 20 minutes, only one vertex centric iteration is missing. I can optimize the program because some operators are only needed to create intermediate debug results, still, it's not enough to run as one Flink j

Re: flink run throws NPE, JobSubmissionResult is null when interactive and not isDetached()

2016-09-26 Thread Luis Mariano Guerra
On Mon, Sep 26, 2016 at 2:07 PM, Maximilian Michels wrote: > Hi Luis, > > With your feedback I was able to find the problem. I have created an > issue and a fix is available which will be in Flink 1.1.3 and Flink > 1.2.0. > > > Thanks, > Thank you! > Max > > [1] https://issues.apache.org/jira/

Re: flink run throws NPE, JobSubmissionResult is null when interactive and not isDetached()

2016-09-26 Thread Maximilian Michels
Hi Luis, With your feedback I was able to find the problem. I have created an issue and a fix is available which will be in Flink 1.1.3 and Flink 1.2.0. Thanks, Max [1] https://issues.apache.org/jira/browse/FLINK-4677 On Tue, Sep 20, 2016 at 2:00 PM, Luis Mariano Guerra wrote: > On Tue, Sep 2

Re: can Flink use multi "addSink"?

2016-09-26 Thread Gábor Gévay
Hello, You can't call map on the sink, but instead you can continue from the stream that you have just before the sink: val stream = datastream.filter(new Myfilter()) val sink1 = stream.addSink(new Mysink()) val sink2 = stream.map(new MyMap()).addSink(MySink2()) Best, Gábor 2016-09-26 12:47 G

can Flink use multi "addSink"?

2016-09-26 Thread 侯林蔚
hello, I am a green hand with Flink, and I want to build a topology with streaming api like this: datastream.fliter(new Myfilter()).addSink(new Mysink()).map(new MyMap()).addSink(MySink2()) can Flink support operations like this? Hope for your reply sincerely. thx.

Re: Flink: How to handle external app configuration changes in flink

2016-09-26 Thread Jamie Grier
Hi Govindarajan, Typically the way people do this is to create a stream of configuration changes and consume this like any other stream. For the specific case of filtering for example you may have a data stream and a stream of filters that you want to run the data through. The typically approach

Re: Questions on flink

2016-09-26 Thread Jamie Grier
Hi Govindarajan, I've put some answers in-line below.. On Sat, Sep 24, 2016 at 7:32 PM, Govindarajan Srinivasaraghavan < govindragh...@gmail.com> wrote: > Hi, > > I'm working on apache flink for data streaming and I have few questions. > Any help is greatly appreciated. Thanks. > > 1) Are there

Re: RawSchema as deserialization schema

2016-09-26 Thread Swapnil Chougule
Okay. Thanks for the update Robert. On Mon, Sep 26, 2016 at 3:08 PM, Robert Metzger wrote: > The RawSchema was once part of Flink's Kafka connector. I've removed it > because its implementation is trivial and I didn't expect that there are > many people who need the schema (also, I think I saw p

Re: RawSchema as deserialization schema

2016-09-26 Thread Robert Metzger
The RawSchema was once part of Flink's Kafka connector. I've removed it because its implementation is trivial and I didn't expect that there are many people who need the schema (also, I think I saw people using a map() operator after the consumer to deserialize the byte[] into their formats). As S

Re: RawSchema as deserialization schema

2016-09-26 Thread Swapnil Chougule
Hi Robert, May I know your inputs on same ? Thanks, Swapnil On Thu, Sep 22, 2016 at 7:12 PM, Stephan Ewen wrote: > /cc Robert, he is looking into extending the Kafka Connectors to support > more of Kafka's direct utilities > > On Thu, Sep 22, 2016 at 3:17 PM, Swapnil Chougule > wrote: > >> It

Re: How can I prove ....

2016-09-26 Thread Stephan Ewen
You do not need to create any JSON. Just click on "Running Jobs" in the UI, and then on the job. The parallelism is shown as a number in the boxes of the graph. On Sat, Sep 24, 2016 at 6:28 PM, amir bahmanyari wrote: > Thanks Felix. > Interesting. I tried to create the JASON but didnt work acc