Re: Flink's WordCount at scale of 1BLN of unique words

2016-05-31 Thread Xtra Coder
Thanks, things are clear so far.

Re: Flink's WordCount at scale of 1BLN of unique words

2016-05-25 Thread Aljoscha Krettek
Hi, first, regarding your use-case questions: 1. if you do a keyBy(..) on the "word" then the same words will end up on the same machine. 2. This depends on the StateBackend that you use. For example, there is the FileStateBackend that keeps state in memory and does checkpoints to a file system an

Re: Flink's WordCount at scale of 1BLN of unique words

2016-05-24 Thread Xtra Coder
Mentioning 100TB "in my context" is more like "saving current state" at some point of time to "backup" or "direct access" storage and continue with next 100TB/hours/days of streamed data. So - no, it is not about a finite data set. On Mon, May 23, 2016 at 11:13 AM, Matthias J. Sax wrote: > Are y

Re: Flink's WordCount at scale of 1BLN of unique words

2016-05-23 Thread Matthias J. Sax
Are you talking about a streaming or a batch job? You are mentioning a "text stream" but also say you want to stream 100TB -- indicating you have a finite data set using DataSet API. -Matthias On 05/22/2016 09:50 PM, Xtra Coder wrote: > Hello, > > Question from newbie about how Flink's WordCou

Flink's WordCount at scale of 1BLN of unique words

2016-05-22 Thread Xtra Coder
Hello, Question from newbie about how Flink's WordCount will actually work at scale. I've read/seen rather many high-level presentations and do not see more-or-less clear answers for following … Use-case: -- there is huuuge text stream with very variable set of words – let's say 1BLN