Re: A "per operator instance" window all ?

2018-02-18 Thread 周思华
Hi Julien, If I am not misunderstand, I think you can key your stream on a `Random.nextInt() % parallesm`, this way you can "group" together alerts from different and benefit from multi parallems. 发自网易邮箱大师 On 02/19/2018 09:08,Xingcan Cui wrote: Hi Julien, sorry for my misunderstandin

Re: Need to understand the execution model of the Flink

2018-02-18 Thread Darshan Singh
Thanks for reply. I guess I am not looking for alternate. I am trying to understand what flink does in this scenario and if 10 tasks ar egoing in parallel I am sure they will be reading csv as there is no other way. Thanks On Mon, Feb 19, 2018 at 12:48 AM, Niclas Hedhman wrote: > > Do you real

Re: A "per operator instance" window all ?

2018-02-18 Thread Xingcan Cui
Hi Julien, sorry for my misunderstanding before. For now, the window can only be defined on a KeyedStream or an ordinary DataStream but with parallelism = 1. I’d like to provide three options for your scenario. 1. If your external data is static and can be fit into the memory, you can use Mana

Re: Need to understand the execution model of the Flink

2018-02-18 Thread Niclas Hedhman
Do you really need the large single table created in step 2? If not, what you typically do is that the Csv source first do the common transformations. Then depending on whether the 10 outputs have different processing paths or not, you either do a split() to do individual processing depending on s

Iterating over state entries

2018-02-18 Thread Ken Krugler
Hi there, I’ve got a MapState where I need to iterate over the entries. This currently isn’t supported (at least for Rocks DB), AFAIK, though there is an issue/PR to improve this. The best solution I’ve seen is what Fabian proposed, which invol

Need to understand the execution model of the Flink

2018-02-18 Thread Darshan Singh
Hi I would like to understand the execution model. 1. I have a csv files which is say 10 GB. 2. I created a table from this file. 3. Now I have created filtered tables on this say 10 of these. 4. Now I created a writetosink for all these 10 filtered tables. Now my question is that are these 10 f

Re: Correlation between number of operators and Job manager memory requirements

2018-02-18 Thread Pawel Bartoszek
Hi, You could definitely try to find formula for heap size, but isnt's it easier just to try out different memory settings and see which works best for you? Thanks, Pawel 17 lut 2018 12:26 "Shailesh Jain" napisał(a): Oops, hit send by mistake. In the configuration section, it is mentioned tha

A "per operator instance" window all ?

2018-02-18 Thread Julien
Hi, I am pretty new to flink and I don't know what will be the best way to deal with the following use case: * as an input, I recieve some alerts from a kafka topic o an alert is linked to a network resource (like router-1, router-2, switch-1, switch-2, ...) o so an alert has

Re: Only a single message processed

2018-02-18 Thread Xingcan Cui
Hi Niclas, About the second point you mentioned, was the processed message a random one or a fixed one? The default startup mode for FlinkKafkaConsumer is StartupMode.GROUP_OFFSETS, maybe you could try StartupMode.EARLIST while debugging. Also, before that, you may try fetching the messages w

Re: Only a single message processed

2018-02-18 Thread Niclas Hedhman
So, the producer is run (at the moment) manually (command-line) one message at a time. Kafka's tooling (different consumer group) shows that a message is added each time. Since my last post, I have also added a UUID as the key, and that didn't make a difference, so you are likely correct about de-

Re: Only a single message processed

2018-02-18 Thread Fabian Hueske
Hi Niclas, Flink's Kafka consumer should not apply any deduplication. AFAIK, such a "feature" is not implemented. Do you produce into the topic that you want to read or is the data in the topic static? If you do not produce in the topic while the consuming application is running, this might be an