Re: Issue with running Flink Python jobs on cluster

2016-07-14 Thread Geoffrey Mon
I've come across similar issues when trying to set up Flink on Amazon EC2 instances. Presumably there is something wrong with my setup? Here is the flink-conf.yaml I am using: https://gist.githubusercontent.com/GEOFBOT/3ffc9b21214174ae750cc3fdb2625b71/raw/flink-conf.yaml Thanks, Geoffrey On Wed,

Re: Questions about slot sharing

2016-07-14 Thread Vincent Wang
Got it, thank you, Robert. 2016-07-14 20:55 GMT+08:00 Robert Metzger : > Hi Huafeng, > > yes, the mapper and reducer are running in different threads on the > TaskManager. Slot sharing is an abstract concept of the scheduler. > > Flink also supports thread-sharing, for example when you have a ser

Re: Parameters to Control Intra-node Parallelism

2016-07-14 Thread Saliya Ekanayake
Thank you, Ovidiu. On Wed, Jul 13, 2016 at 3:34 PM, Ovidiu-Cristian MARCU < ovidiu-cristian.ma...@inria.fr> wrote: > Hi, > > I would pay attention to the memory settings such that > heap+off-heap+network buffers can be served from your node’s RAM for both > TMs. > Also, there is some correlation

Re: Ability to partition logs per pipeline

2016-07-14 Thread Chawla,Sumit
Hi Robert I actually mean both. Scenarios where multiple jobs are running on cluster, and same job could be running on multiple task managers. How can we make sure that each job logs to a different file so that Logs are not mixed, and its easy to debug a particular job. Something like Hadoop Y

Re: Random access to small global state

2016-07-14 Thread Robert Metzger
Hi, For Ignite, Flink has a Sink, which is a one-directional thing. I think that Sebastian needs a bi-directional connection. An in-memory KV store like redis or memcache is probably the best option for such a use case (it reminds me a bit of the Yahoo streaming benchmark [1]). [1] https://yahooe

Re: dynamic streams and patterns

2016-07-14 Thread Robert Metzger
Hi Claudia, 1) What do you mean by dynamically adding? In standalone mode (which you would probably use with Docker images), you can just start additional TaskManagers, which will connect to a JobManager. So you could implement some monitoring to start new TaskManagers as soon as they are needed.

Re: [Discuss] Ordering of Records

2016-07-14 Thread vinay patil
Hi Robert, This was the same posted by me in stack overflow since I was not getting any reply here :) Regards, Vinay Patil On Thu, Jul 14, 2016 at 6:55 PM, rmetzger0 [via Apache Flink User Mailing List archive.] wrote: > There is a parallel thread answering the questions going on here already:

Re: [Discuss] Ordering of Records

2016-07-14 Thread Robert Metzger
There is a parallel thread answering the questions going on here already: http://stackoverflow.com/questions/38354713/ordering-of-records-in-stream On Tue, Jul 12, 2016 at 7:12 PM, vinay patil wrote: > Hi, > > Here are some of the queries I have : > > I have two different streams stream1 and st

Re: how to get rid of null pointer exception in collection in DataStream

2016-07-14 Thread Robert Metzger
Hi Subash, The problem you are facing is not related to Flink. The problem is that the "centroids" field is not initialized, which is general Java issue. Please keep in mind that this list is not the best forum for Java questions. Regards, Robert On Wed, Jul 13, 2016 at 6:45 PM, subash basnet

Re: Writing in flink clusters

2016-07-14 Thread Robert Metzger
I agree with Chesnay that we should report the file name. Can you create a [hotfix] PR for that? On Wed, Jul 13, 2016 at 3:46 PM, Chesnay Schepler wrote: > Hello, > > Is that the complete error message? I'm a bit surprised it does not > explicitly name any file name. If it really doesn't we shou

Re: How large a Flink cluster can have?

2016-07-14 Thread Robert Metzger
Hi, I think the reason why this information is not written anywhere is because we don't know it either. Alibaba seems to run a fork of Flink on "thousands of nodes" [1]. Maybe some of the production users on this mailing list can share some information regarding this. [1] http://www.slideshare.

Re: Ability to partition logs per pipeline

2016-07-14 Thread Robert Metzger
Hi Sumit, What exactly do you mean by pipeline? Are you talking about cases were multiple jobs are running concurrently on the same TaskManager, or are you referring to parallel instances of a Flink job? On Wed, Jul 13, 2016 at 9:49 PM, Chawla,Sumit wrote: > Hi All > > Does flink provide any ab

Re: Questions about slot sharing

2016-07-14 Thread Robert Metzger
Hi Huafeng, yes, the mapper and reducer are running in different threads on the TaskManager. Slot sharing is an abstract concept of the scheduler. Flink also supports thread-sharing, for example when you have a series of mappers (or filters) running with the same parallelism. We call this feature

Helping Spread the Word about Apachecon EU 2016

2016-07-14 Thread Sharan Foga
Hi Everyone I'm forwarding the following message on behalf of Rich Bowen and the Apachecon team === As you are aware, we are holding ApacheCon in Seville in November. While this seems like a long way away, it is critical that we get on people's calendar now, so that they can plan, get b

Questions about slot sharing

2016-07-14 Thread Vincent Wang
Hi all, I'm totally new to Flink and I got a question about Flink's slot sharing feature. Suppose I have a pipeline like *Mapper-> Reducer, *and of cause the operators are* not *chained to a single task. When the job is actually scheduled to the Flink cluster, there is a Mapper task and a Re