Tumbling time window cannot group events properly

2016-07-03 Thread Yukun Guo
Hi, I wrote a program which constructs a WindowedStream to compute periodic data statistics every 10 seconds. However, I found that events have not been strictly grouped into windows of 10s duration, i.e., some events are leaking into the adjacent window. The output is like this: Mon, 04 Jul 201

Re: Error submitting stand-alone Flink job to EMR YARN cluster

2016-07-03 Thread Jamie Grier
Hi Bruce, I just spun up an EMR cluster and tried this out. Hadoop 2.7.2 and Flink 1.0.3. I ran the exact same command as you and everything works just fine. Please verify one thing, though. In your command you do not specify the path to the Flink executable, which means it's just getting pick

Re: Parameters to Control Intra-node Parallelism

2016-07-03 Thread Saliya Ekanayake
Thank you! On Sun, Jul 3, 2016 at 11:28 AM, Ufuk Celebi wrote: > Yes, exactly. > > On Sat, Jul 2, 2016 at 6:28 PM, Saliya Ekanayake > wrote: > > Thank you, yes, it can be done externally, if not supported within Flink. > > > > So the way to spawn multiple task managers would be to list the same

Re: Parameters to Control Intra-node Parallelism

2016-07-03 Thread Ufuk Celebi
Yes, exactly. On Sat, Jul 2, 2016 at 6:28 PM, Saliya Ekanayake wrote: > Thank you, yes, it can be done externally, if not supported within Flink. > > So the way to spawn multiple task managers would be to list the same slave > machines N times as necessary in the slaves file? > > On Sat, Jul 2, 2

Data point goes missing within iteration

2016-07-03 Thread Biplob Biswas
Hi, I am reading data points from a file and then i have to perform iterations over it. When I just check the data points before the iteration as follows, tuples.flatMap(new CheckData()) and print count inside CheckData() then I get 2500 data points each over 4 partitions, i.e. 1 datapoint