Re: Flink Streaming : PartitionBy vs GroupBy differences

2015-07-03 Thread Welly Tambunan
Thanks Gyula Cheers On Fri, Jul 3, 2015 at 6:19 PM, Gyula Fóra wrote: > Yes, you can think of it that way. Each Operator has parallel instances > and each parallel instance receives input from multiple channels (FIFO from > each) and produces output. > > Welly Tambunan ezt írta (időpont: 2015

Re: Flink Streaming : PartitionBy vs GroupBy differences

2015-07-03 Thread Gyula Fóra
Yes, you can think of it that way. Each Operator has parallel instances and each parallel instance receives input from multiple channels (FIFO from each) and produces output. Welly Tambunan ezt írta (időpont: 2015. júl. 3., P, 13:02): > Hi Gyula, > > Thanks a lot. That's enough for my case. > >

Re: Flink Streaming : PartitionBy vs GroupBy differences

2015-07-03 Thread Welly Tambunan
Hi Gyula, Thanks a lot. That's enough for my case. I do really love Flink Streaming model compare to Spark Streaming. So is that true that i can think that Operator as an Actor model in this system ? Is that a right way to put it ? Cheers On Fri, Jul 3, 2015 at 5:29 PM, Gyula Fóra wrote: >

Re: Flink Streaming : PartitionBy vs GroupBy differences

2015-07-03 Thread Gyula Fóra
Hey, 1. Yes, if you use partitionBy the same key will always go to the same downstream operator instance. 2. There is only partial ordering guarantee, meaning that data received from one input is FIFO. This means that if the same key is coming from multiple inputs than there is no ordering guaran

Re: Flink Streaming : PartitionBy vs GroupBy differences

2015-07-03 Thread Welly Tambunan
Hi Gyula, Thanks for your response. So if i use partitionBy then data point with the same will receive exactly by the same instance of operator ? Another question is if i execute reduce() operator on after partitionBy, will that reduce operator guarantee ordering within the same key ? Cheers

Re: In windows8 + VitualBox, how to build Flink development environment?

2015-07-03 Thread Stephan Ewen
Let us know if you run into setup problems... On Fri, Jul 3, 2015 at 11:25 AM, Stephan Ewen wrote: > Hi! > > With Windows 8 + VirtualBox, I would run a Linux VM. I run Ubuntu in > VirtualBox on Windows 7 myself. > > In the Linux environment, make sure you have git, Java 7+ and Maven, see > also

Re: In windows8 + VitualBox, how to build Flink development environment?

2015-07-03 Thread Stephan Ewen
Hi! With Windows 8 + VirtualBox, I would run a Linux VM. I run Ubuntu in VirtualBox on Windows 7 myself. In the Linux environment, make sure you have git, Java 7+ and Maven, see also here: https://github.com/apache/flink/blob/master/README.md#building-apache-flink-from-source For development, I

In windows8 + VitualBox, how to build Flink development environment?

2015-07-03 Thread Chenliang (Liang, DataSight)
Dear In windows8 + VitualBox, how to build Flink development environment?

Re: Flink Streaming : PartitionBy vs GroupBy differences

2015-07-03 Thread Gyula Fóra
Hey! Both groupBy and partitionBy will trigger a shuffle over the network based on some key, assuring that elements with the same keys end up on the same downstream processing operator. The difference between the two is that groupBy in addition to this returns a GroupedDataStream which lets you e

Re: Open method is not called with custom implementation RichWindowMapFunction

2015-07-03 Thread Welly Tambunan
Thanks Chiwan Great Job ! Cheers On Fri, Jul 3, 2015 at 3:32 PM, Chiwan Park wrote: > I found that the patch had been merged to upstream. [1] :) > > Regards, > Chiwan Park > > [1] https://github.com/apache/flink/pull/855 > > > On Jul 3, 2015, at 5:26 PM, Welly Tambunan wrote: > > > > Thanks C

Re: Open method is not called with custom implementation RichWindowMapFunction

2015-07-03 Thread Chiwan Park
I found that the patch had been merged to upstream. [1] :) Regards, Chiwan Park [1] https://github.com/apache/flink/pull/855 > On Jul 3, 2015, at 5:26 PM, Welly Tambunan wrote: > > Thanks Chiwan, > > > Glad to hear that. > > > Cheers > > On Fri, Jul 3, 2015 at 3:24 PM, Chiwan Park wrot

Flink Streaming : PartitionBy vs GroupBy differences

2015-07-03 Thread tambunanw
Hi All, I'm trying to digest what's the difference between this two. From my experience in Spark GroupBy will cause shuffling on the network. Is that the same case in Flink ? I've watch videos and read a couple docs about Flink that's actually Flink will compile the user code into it's own opti

Re: Open method is not called with custom implementation RichWindowMapFunction

2015-07-03 Thread Welly Tambunan
Thanks Chiwan, Glad to hear that. Cheers On Fri, Jul 3, 2015 at 3:24 PM, Chiwan Park wrote: > Hi tambunanw, > > The issue is already known and we’ll patch soon. [1] > In next release (maybe 0.9.1), the problem will be solved. > > Regards, > Chiwan Park > > [1] https://issues.apache.org/jira/

Re: Open method is not called with custom implementation RichWindowMapFunction

2015-07-03 Thread Chiwan Park
Hi tambunanw, The issue is already known and we’ll patch soon. [1] In next release (maybe 0.9.1), the problem will be solved. Regards, Chiwan Park [1] https://issues.apache.org/jira/browse/FLINK-2257 > On Jul 3, 2015, at 4:57 PM, tambunanw wrote: > > Hi All, > > I'm trying to create some ex

Open method is not called with custom implementation RichWindowMapFunction

2015-07-03 Thread tambunanw
Hi All, I'm trying to create some experiment with rich windowing function and operator state. I modify the streaming stock prices from https://github.com/mbalassi/flink/blob/stockprices/flink-staging/flink-streaming/flink-streaming-examples/src/main/scala/org/apache/flink/streaming/scala/exampl

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

2015-07-03 Thread Maximilian Michels
You're welcome. I'm glad I could help out :) Cheers, Max On Thu, Jul 2, 2015 at 9:17 PM, Mihail Vieru wrote: > I've implemented the alternating 2 files solution and everything works > now. > > Thanks a lot! You saved my day :) > > Cheers, > Mihail > > > On 02.07.2015 12:37, Maximilian Michels

Re: TeraSort on Flink and Spark

2015-07-03 Thread Stephan Ewen
Flavio, In general, String works well in Flink, because it behaves for sorting much like this OptimizedText. If you want to access the String contents, then using String is good. Text may have slight advantages if you never access the actual contents, but just partition and sort it (as in TeraSor