Re: Flink kafka group question

2016-07-28 Thread Tai Gordon
Hi, 1. What Flink Kafka connector version are you using? 2. How is your non-Flink consumer fetching data from the topic? Is it using the old SimpleConsumer, old High-Level Consumer, or the new consumer API? 3. If you are using the new consumer API, are you using "consumer.assign(…)" or "consumer.s

Container running beyond physical memory limits when processing DataStream

2016-07-28 Thread Jack Huang
Hi all, I am running a test Flink streaming task under YARN. It reads messages from a Kafka topic and writes them to local file system. object PricerEvent { def main(args:Array[String]) { val kafkaProp = new Properties() kafkaProp.setProperty("bootstrap.servers", "localhost:66

Re: Reprocessing data in Flink / rebuilding Flink state

2016-07-28 Thread Jason Brelloch
Hey Josh, The way we replay historical data is we have a second Flink job that listens to the same live stream, and stores every single event in Google Cloud Storage. When the main Flink job that is processing the live stream gets a request for a specific data set that it has not been processing

Reprocessing data in Flink / rebuilding Flink state

2016-07-28 Thread Josh
Hi all, I was wondering what approaches people usually take with reprocessing data with Flink - specifically the case where you want to upgrade a Flink job, and make it reprocess historical data before continuing to process a live stream. I'm wondering if we can do something similar to the 'simpl

Flink kafka group question

2016-07-28 Thread 王萌
Hi, I am using kafka in couple of programs including flink and I am quite confused that how group.id parameter work in flink with kafka comsumer.I have 2 consumers (one in side flink, one outside) running on the same topic and same group.id. From my inpection, they work isolately:If I send one

how does flink assign windows to task

2016-07-28 Thread Vishnu Viswanath
Hi, Lets say I have a window on a keyed stream, and I have about 100 unique keys. And assume I have about 50 tasks slots in my cluster. And suppose my trigger fired 70/100 windows/pane at the same time. How will flink handle this? Will it assign 50/70 triggered windows to the 50 available task sl

hash preservation

2016-07-28 Thread Robert Schwarzenberg
Dear Flink developers, I have a question concerning the preservation of hash values. I have a hashmap keyed by Scala objects that directly inherit the hashCode() and equals() methods from Any. (These objects are only used to address values in the hashmap mentioned above; they aren't used as k

Re: Killing Yarn Session Leaves Lingering Flink Jobs

2016-07-28 Thread Maximilian Michels
Hi Konstantin, If you come from traditional on-premise installations it may seem counter-intuitive to start a Flink cluster for each job. However, in today's cluster world it is not a problem to request containers on demand and spawn a new Flink cluster for each job. Per job clusters are convenien

Re: Killing Yarn Session Leaves Lingering Flink Jobs

2016-07-28 Thread Konstantin Knauf
Hi Stephan, thank you for this clarification. I have a slightly related follow up question. I keep reading that, the preferred way to run Flink on Yarn is with "Flink-job-at-a-time-on-yarn". Can you explain this a little further? Of course, with separate YARN session the jobs are more decoupled, b

Re: how to start tuning to prevent OutOfMemory

2016-07-28 Thread Timo Walther
Hi Istvan, are you running batch or streaming Flink jobs? It is often the user-defined code that creates a lot of objects during runtime. Try to get rid of object creation as much as possible, maybe also a library you are using causes these issues. You might also reduce the size of network bu