Re: Operational concerns with state (was Re: Window limitations on groupBy)

2017-01-19 Thread Raman Gupta
Thank you Fabian, the blog articles were very useful. I will continue experimenting. On Thu, Jan 19, 2017 at 3:36 PM, Fabian Hueske wrote: > Hi Raman, > > Checkpoints are used to recover from task or process failures and usually > automatically taken at periodic intervals if configured correctly.

Re: Flink SQL on JSON data without schema

2017-01-19 Thread Fabian Hueske
The first level must be static, but a field can hold a complex object with nested data. So you could have a schema with a single JSON object as field. Best, Fabian 2017-01-19 19:31 GMT+01:00 Nihat Hosgur : > Hi Fabian, > I just want to make sure there is no misunderstanding. So what I've > under

Re: Operational concerns with state (was Re: Window limitations on groupBy)

2017-01-19 Thread Fabian Hueske
Hi Raman, Checkpoints are used to recover from task or process failures and usually automatically taken at periodic intervals if configured correctly. Checkpoints are usually removed when a more recent checkpoint is completed (the exact policy can be configured). Savepoints are used to restart a

Re: Keep bootstrapped config updated with stream from Kafka

2017-01-19 Thread Nihat Hosgur
Hi Fabian, Thank you for the pointer. Thank, Nihat On Thu, Jan 19, 2017 at 2:46 AM Fabian Hueske wrote: > Hi Nihat, > > you could implement the stateful function as a RichFunction and load the > data in the open() method. > > Best, Fabian > > 2017-01-19 2:53 GMT+01:00 Nihat Hosgur : > > Hi all,

Re: Flink SQL on JSON data without schema

2017-01-19 Thread Nihat Hosgur
Hi Fabian, I just want to make sure there is no misunderstanding. So what I've understood from your response is that regardless table source is KafkaTable or not we need to provide static schema. Best, Nihat On Thu, Jan 19, 2017 at 2:50 AM Fabian Hueske wrote: Hi Nihat, at the current state, F

Fwd: Re: NPE in JobManager

2017-01-19 Thread Dave Marion
Noticed I didn't cc the user list. Original Message -- From: Dave Marion To: Ted Yu Date: January 19, 2017 at 12:13 PM Subject: Re: NPE in JobManager That might take some time. Here is a hand typed top N lines. If that is not enough let me know and I will start the process of

Operational concerns with state (was Re: Window limitations on groupBy)

2017-01-19 Thread Raman Gupta
I was able to get it working well with the original approach you described. Thanks! Note that the documentation on how to do this with the Java API is... sparse, to say the least. I was able to look at the implementation of the scala flatMapWithState function as a starting point. Now I'm trying to

Re: Register a user scope metric in window reduce function

2017-01-19 Thread Stephan Ewen
Hi! There is a window variant where you can use both a ReduceFunction and a Window Function. You can use the metrics in the WindowFunction. Does that help your use case, or do you need the metrics strictly in the ReduceFunction? There are thoughts to allow the ReduceFunction to be "semi-rich", me

Re: NPE in JobManager

2017-01-19 Thread Ted Yu
Can you pastebin the complete stack trace for the NPE ? Thanks On Thu, Jan 19, 2017 at 8:57 AM, Dave Marion wrote: > I'm running flink-1.1.4-bin-hadoop27-scala_2.11 and I'm running into an > issue where after some period of time (measured in 1 - 3 hours) the > JobManager gets an NPE and shuts i

NPE in JobManager

2017-01-19 Thread Dave Marion
I'm running flink-1.1.4-bin-hadoop27-scala_2.11 and I'm running into an issue where after some period of time (measured in 1 - 3 hours) the JobManager gets an NPE and shuts itself down. The failure is at JobManager$$updateAccumulators$1.apply(JobManager.scala:1790). I'm using a custom accumulat

Re: Cluster failure after zookeeper glitch.

2017-01-19 Thread Andrew Ge Wu
Hi Stefan Yes we are running in HA mode with dedicated zookeeper cluster. As far as I can see it looks like a networking issue with zookeeper cluster. 2 out of 5 zookeeper reported something around the same time: server1 2017-01-19 11:52:13,044 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0

Re: Cluster failure after zookeeper glitch.

2017-01-19 Thread Stefan Richter
Hi, I think depending on your configuration of Flink (are you using high availability mode?) and the type of ZK glitches we are talking about, it can very well be that some of Flink’s meta data in ZK got corrupted and the system can not longer operate. But for a deeper analysis, we would need m

Cluster failure after zookeeper glitch.

2017-01-19 Thread Andrew Ge Wu
Hi, We recently had several zookeeper glitch, when that happens it seems to take flink cluster with it. We are running on 1.03 It started like this: 2017-01-19 11:52:13,047 INFO org.apache.zookeeper.ClientCnxn - Unable to read additional data from server sessi

Re: How to get help on ClassCastException when re-submitting a job

2017-01-19 Thread Fabian Hueske
Hi Giuliano, I think it would be good to document this behavior, not sure though what the best place would be. It would be nice, if you could open a JIRA and describe the issue there (basically copy Yuri's analysis). Thank you, Fabian 2017-01-19 8:35 GMT+01:00 Giuliano Caliari : > Hello, > > Yu

Re: Flink SQL on JSON data without schema

2017-01-19 Thread Fabian Hueske
Hi Nihat, at the current state, Flink's SQL and Table APIs require a static schema. You could use an JSON object as value and implement scalar functions to extract fields, but that would not be very usable. Best, Fabian 2017-01-19 2:59 GMT+01:00 Nihat Hosgur : > Hi there, > We are evaluating fl

Re: Keep bootstrapped config updated with stream from Kafka

2017-01-19 Thread Fabian Hueske
Hi Nihat, you could implement the stateful function as a RichFunction and load the data in the open() method. Best, Fabian 2017-01-19 2:53 GMT+01:00 Nihat Hosgur : > Hi all, > > We bootstrap data from some DB and then like to keep it updated with > updates coming through Kafka. At spark it was

Re: Window limitations on groupBy

2017-01-19 Thread Fabian Hueske
Hi Raman, I think you would need a sliding count window of size 2 with slide 1. This is basically a GlobalWindow with a special trigger. However, you would need to modify the custom trigger to be able to - identify a terminal event (if there is such a thing) or to - close the window after a certa

Re: seeding a stream job

2017-01-19 Thread Fabian Hueske
Hi Jared, I think both approaches should work. The source that integrates the finite batch input and the stream might be more comfortable to use. As you said, the challenge would be to identify the exact point when to switch from one input to the other. One thing to consider when reading finite b