Re: CodeAnalysisMode in Flink

2016-12-20 Thread vinay patil
Hi, Any updates on this thread ? Regards, Vinay Patil On Fri, Nov 18, 2016 at 10:25 PM, Fabian Hueske-2 [via Apache Flink User Mailing List archive.] wrote: > Hi Vinay, > > not sure why it's not working, but maybe TImo (in CC) can help. > > Best, Fabian > > 2016-11-18 17:41 GMT+01:00 Vinay Pat

static/dynamic lookups in flink streaming

2016-12-20 Thread Meghashyam Sandeep V
Hi there, I know that there are various state backends to persist state. Is there a similar way to persist static/dynamic look ups and use them while streaming the data in Flink? Thanks, Sandeep

Re: Flink rolling upgrade support

2016-12-20 Thread Stephan Ewen
Hi Andrew! Would be great to know if what Aljoscha described works for you. Ideally, this costs no more than a failure/recovery cycle, which one typically also gets with rolling upgrades. Best, Stephan On Tue, Dec 20, 2016 at 6:27 PM, Aljoscha Krettek wrote: > Hi, > zero-downtime updates are

[no subject]

2016-12-20 Thread Abiy Legesse Hailemichael
I am running a standalone flink cluster (1.1.2) and I have a stateful streaming job that uses RocksDB as a state manager. I have two stateful operators that are using ValueState<> and ListState<>. Every now and then my job fails with the following exception Caused by: AsynchronousException{java.io

Re: Serializing NULLs

2016-12-20 Thread Anirudh Mallem
If you are using Avro generated classes then you cannot have your values null. https://cwiki.apache.org/confluence/display/AVRO/FAQ#FAQ-Whyisn'teveryvalueinAvronullable? From: Stephan Ewen Reply-To: "user@flink.apache.org" Date: Tuesday, December 20, 2016 at 8:17 AM T

Re: Flink rolling upgrade support

2016-12-20 Thread Aljoscha Krettek
Hi, zero-downtime updates are currently not supported. What is supported in Flink right now is a savepoint-shutdown-restore cycle. With this, you first draw a savepoint (which is essentially a checkpoint with some meta data), then you cancel your job, then you do whatever you need to do (update mac

Re: Continuous File monitoring not reading nested files

2016-12-20 Thread Aljoscha Krettek
+kostas, who probably has the most experience with this by now. Do you have an idea what might be going on? On Fri, 16 Dec 2016 at 15:45 Yassine MARZOUGUI wrote: > Looks like this is not specific to the continuous file monitoring, I'm > having the same issue (files in nested directories are not

Stateful Stream Processing with RocksDB causing Job failure

2016-12-20 Thread Abiy Legesse Hailemichael
I am running a standalone flink cluster (1.1.2) and I have a stateful streaming job that uses RocksDB as a state manager. I have two stateful operators that are using ValueState<> and ListState<>. Every now and then my job fails with the following exception java.lang.Exception: Could not restore c

Re: Serializing NULLs

2016-12-20 Thread Stephan Ewen
Thanks for sharing the stack trace. This seems not really Flink related, it is part of the specific Avro encoding logic. The Avro Generic Record Type apparently does not allow the map value to be null. On Tue, Dec 20, 2016 at 4:55 PM, Matt wrote: > Here is the back trace: https://gist.github.

Re: Serializing NULLs

2016-12-20 Thread Deepak Sharma
Hi Matt Can you please share the code like how you are mapping null to long? Thanks Deepak On Tue, Dec 20, 2016 at 9:25 PM, Matt wrote: > Here is the back trace: https://gist.github.com/ > 56af4818bcf5dee6b97c248fd9233c67 > > In the meanwhile I've solved the issue by creating a POJO class where

Re: Serializing NULLs

2016-12-20 Thread Matt
Here is the back trace: https://gist.github.com/56af4818bcf5dee6b97c248fd9233c67 In the meanwhile I've solved the issue by creating a POJO class where null is just Long.MIN_VALUE, that with a custom equals() made the trick. I guess it's not as fast as de/serializing Double though. If you need any

Re: Implementing tee functionality in a streaming job

2016-12-20 Thread Fabian Hueske
Thanks for reporting back Yury! Glad to hear that your use case is covered. Cheers, Fabian 2016-12-20 16:42 GMT+01:00 Yury Ruchin : > Well, it seems I figured it out. You're right, Fabian, it works the way > you described. I wrote a simple test job: > > val env = StreamExecutionEnvironment.getEx

Re: Implementing tee functionality in a streaming job

2016-12-20 Thread Yury Ruchin
My bad, the "Records Out" in the previous message should be read "Records sent" as per Flink UI. 2016-12-20 18:42 GMT+03:00 Yury Ruchin : > Well, it seems I figured it out. You're right, Fabian, it works the way > you described. I wrote a simple test job: > > val env = StreamExecutionEnvironment.

RE: Generate _SUCCESS (map-reduce style) when folder has been written

2016-12-20 Thread Gwenhael Pasquiers
No, don’t worry, I think it’s totally compliant with Hadoop’s behavior but I wanted it to behave more like Flink (to totally clean the destination folder before outputing new files). From: Fabian Hueske [mailto:fhue...@gmail.com] Sent: mardi 20 décembre 2016 16:41 To: user@flink.apache.org Subje

Re: Implementing tee functionality in a streaming job

2016-12-20 Thread Yury Ruchin
Well, it seems I figured it out. You're right, Fabian, it works the way you described. I wrote a simple test job: val env = StreamExecutionEnvironment.getExecutionEnvironment val stream = env.fromCollection(Seq.range(0, 100)) stream.addSink(new DiscardingSink[Int]).disableChaining() stream.map {_

Re: Generate _SUCCESS (map-reduce style) when folder has been written

2016-12-20 Thread Fabian Hueske
Great to hear! Do you mean that the behavior of Flink's HadoopOutputFormat is not consistent with Hadoop's behavior? If that's the case, could you open a JIRA ticket to report this and maybe also contribute your changes back? Thanks a lot, Fabian 2016-12-20 16:37 GMT+01:00 Gwenhael Pasquiers < g

Re: Calculating stateful counts per key

2016-12-20 Thread Fabian Hueske
Hi Mäki, some additions to Greg's answer: The flatMapWithState shortcut of the Scala API uses Flink's key-value state while your TestCounters class uses the Checkpointed interface. As Greg said, the checkpointed interface operates on an operator level not per key. The key-value state automatically

RE: Generate _SUCCESS (map-reduce style) when folder has been written

2016-12-20 Thread Gwenhael Pasquiers
Thanks, it is working properly now. NB : Had to delete the folder by code because Hadoop’s OuputFormats will only overwrite file by file, not the whole folder. From: Fabian Hueske [mailto:fhue...@gmail.com] Sent: mardi 20 décembre 2016 14:21 To: user@flink.apache.org Subject: Re: Generate _SUCCES

Re: Calculating stateful counts per key

2016-12-20 Thread Greg Hogan
Hi Mäki, This is the expected output. Your RichFlatMapFunction is opened once per task and you are sharing counterValue for all keys processed by that task. Greg On Mon, Dec 19, 2016 at 11:38 AM, Mäki Hanna wrote: > Hi, > > > > I'm trying to calculate stateful counts per key with checkpoints

Re: Generate _SUCCESS (map-reduce style) when folder has been written

2016-12-20 Thread Fabian Hueske
Hi Gwenhael, The _SUCCESS files were originally generated by Hadoop for successful jobs. AFAIK, Spark leverages Hadoop's Input and OutputFormats and seems to have followed this approach as well to be compatible. You could use Flink's HadoopOutputFormat which is a wrapper for Hadoop OutputFormats

Generate _SUCCESS (map-reduce style) when folder has been written

2016-12-20 Thread Gwenhael Pasquiers
Hi, Sorry if it's already been asked but is there an embedded way for flink to generate a _SUCCESS file in the folders it's been writing into (using the write method with OutputFormat) ? We are replacing a spark job that was generating those files (and further operations rely on it). Best reg

Re: Implementing tee functionality in a streaming job

2016-12-20 Thread Yury Ruchin
Thanks Fabian, I will try creating a toy job illustrating the issue and get back. 2016-12-20 12:58 GMT+03:00 Fabian Hueske : > Hi Yury, > > your solution should exactly solve your problem. > An operator sends all outgoing records to all connected successor > operators. > There should not be any n

Re: Implementing tee functionality in a streaming job

2016-12-20 Thread Fabian Hueske
Hi Yury, your solution should exactly solve your problem. An operator sends all outgoing records to all connected successor operators. There should not be any non-deterministic behavior or splitting of records. Can you share some example code that produces the non-deterministic behavior? Best, F

Implementing tee functionality in a streaming job

2016-12-20 Thread Yury Ruchin
Hi all, I have a streaming job that I want at some point to duplicate stream, so that one copy of it ends up in a sink, and another one goes down the processing pipeline. This way I want to implement something similar to "tee" Linux utility. I tried a naive approach that looked like this: val st

Re: Serializing NULLs

2016-12-20 Thread Stephan Ewen
The "null" support in some types is not fully developed. However in that case I am wondering why it does not work. Can you share the stack trace, so we can take a look at the serializer? On Mon, Dec 19, 2016 at 9:56 PM, Matt wrote: > Hello list, > > I'm getting this error: > > > *java.lang.Run