Re: Failed job restart - flink on yarn

2016-07-01 Thread vpra...@gmail.com
Hi Jamie, Thanks for the reply. Yeah i looked at save points, i want to start my job only from the last checkpoint, this means I have to keep track of when the checkpoint was taken and the trigger a save point. I am not sure this is the way to go. My state backend is HDFS and I can see that the c

Re: Using standalone single node without HA in production, crazy?

2016-07-01 Thread Ryan Crumley
Jamie, Thank you for your insight. To answer your questions I am running on AWS and have access to S3. Further I already have Zookeeper in the mix (its used by Mesos as well as Kafka). I was hoping to avoid the complexities of an automated HA setup by running a single jvm and then migrate to HA do

Re: Parameters to Control Intra-node Parallelism

2016-07-01 Thread Saliya Ekanayake
Hi Ufuk, Looking at the document you sent it seems only 1 task manager per node exist and within that you have multiple slots. Is it possible to run more than 1 task manager per node? Also, within a task manager is the parallelism done through threads or processes? Thank you, Saliya On Thu, Jun

Re: Using standalone single node without HA in production, crazy?

2016-07-01 Thread Jamie Grier
I started to answer these questions and then realized I was making an assumption about your environment. Do you have a reliable persistent file system such as HDFS or S3 at your disposal or do you truly mean to run on a single node? If the you are truly thinking to run on a single node only there

Re: Failed job restart - flink on yarn

2016-07-01 Thread Jamie Grier
Hi Prabhu, Have you taken a look at Flink's savepoints feature? This allows you to make snapshots of your job's state on demand and then at any time restart your job from that point: https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/streaming/savepoints.html Also know that you can

Re: Error submitting stand-alone Flink job to EMR YARN cluster

2016-07-01 Thread Jamie Grier
I know this is really basic but have you verified that you're Flink lib folder contains log4j-1.2.17.jar? I imagine that's fine given the yarn-session.sh approach is working fine. What version of EMR are you running? What version of Flink? -Jamie On Thu, Jun 30, 2016 at 11:28 AM, Hanson, Bruc

Re: Flink for historical time series processing

2016-07-01 Thread Jamie Grier
Hi Mindis, This does actually sound like a good use case for Flink. Without knowing more details it's a bit hard to say which of the options you mention would be most efficient but my gut feeling is that the "one big dataset" approach would be the way to go. I think there probably is a simplifie

Failed job restart - flink on yarn

2016-07-01 Thread vpra...@gmail.com
Hi, I have a flink streaming job that reads from kafka, performs a aggregation in a window, it ran fine for a while however when the number of events in a window crossed a certain limit , the yarn containers failed with Out Of Memory. The job was running with 10G containers. We have about 64G me

Re: Different results on local and on cluster

2016-07-01 Thread Flavio Pompermaier
what do you mean exactly? On 1 Jul 2016 18:58, "Aljoscha Krettek" wrote: > Hi, > do you have any data in the coGroup/groupBy operators that you use, > besides the input data? > > Cheers, > Aljoscha > > On Fri, 1 Jul 2016 at 14:17 Flavio Pompermaier > wrote: > >> Hi to all, >> I have a Flink job

Re: Different results on local and on cluster

2016-07-01 Thread Aljoscha Krettek
Hi, do you have any data in the coGroup/groupBy operators that you use, besides the input data? Cheers, Aljoscha On Fri, 1 Jul 2016 at 14:17 Flavio Pompermaier wrote: > Hi to all, > I have a Flink job that computes data correctly when launched locally from > my IDE while it doesn't when launche

Re: How to avoid breaking states when upgrading Flink job?

2016-07-01 Thread Aljoscha Krettek
Ah, this might be in code that runs at a different layer from the StateBackend. Can you maybe pinpoint which of your user classes is this anonymous class and where it is used? Maybe by replacing them by non-anonymous classes and checking which replacement fixes the problem. - Aljoscha On Fri, 1 J

Re: Flink streaming connect and split streams

2016-07-01 Thread Aljoscha Krettek
Hi, I'm afraid the only way to do it right now is using the wrapper that can contain both, as you suggested. Cheers, Aljoscha On Thu, 30 Jun 2016 at 16:50 Martin Neumann wrote: > Hej, > > I'm currently playing around with some machine learning algorithms in > Flink streaming. > > I have an inpu

Re: How to avoid breaking states when upgrading Flink job?

2016-07-01 Thread Josh
I've just double checked and I do still get the ClassNotFound error for an anonymous class, on a job which uses the RocksDBStateBackend. In case it helps, this was the full stack trace: java.lang.RuntimeException: Failed to deserialize state handle and setup initial operator state. at org

Using standalone single node without HA in production, crazy?

2016-07-01 Thread Ryan Crumley
Hi, I am evaluating flink for use in stateful streaming application. Some information about the intended use: - Will run in a mesos cluster and deployed via marathon in a docker container - Initial throughput ~ 100 messages per second (from kafka) - Will need to scale to 10x that soon after la

Different results on local and on cluster

2016-07-01 Thread Flavio Pompermaier
Hi to all, I have a Flink job that computes data correctly when launched locally from my IDE while it doesn't when launched on the cluster. Is there any suggestion/example to understand the problematic operators in this way? I think the root cause is the fact that some operator (e.g. coGroup/group

Re: How to avoid breaking states when upgrading Flink job?

2016-07-01 Thread Josh
Thanks guys, that's very helpful info! @Aljoscha I thought I saw this exception on a job that was using the RocksDB state backend, but I'm not sure. I will do some more tests today to double check. If it's still a problem I'll try the explicit class definitions solution. Josh On Thu, Jun 30, 201

Re: Is it possible to restart only the function that fails instead of entire job?

2016-07-01 Thread Ufuk Celebi
Unfortunately, this is not possible at the moment. This optimization definitely makes sense in certain situations. How large is your state and how long does it take to recover? On Fri, Jul 1, 2016 at 9:18 AM, Chia-Hung Lin wrote: > After reading the document and configuring to test failure strate

Is it possible to restart only the function that fails instead of entire job?

2016-07-01 Thread Chia-Hung Lin
After reading the document and configuring to test failure strategy, it seems to me Flink restarts the job once any failures (e.g. exception thrown, etc.) occur. https://ci.apache.org/projects/flink/flink-docs-master/internals/stream_checkpointing.html My question: Is it possible to configure in