Re: Windows and Watermarks Clarification

2016-09-01 Thread Aljoscha Krettek
Just one clarification: even with a specified allowed lateness the window will still be evaluated once the watermark passes the end of the window. It's just that with allowed lateness the window contents and state will be kept around a bit longer to allow eventual late elements to update the result

Re: Metrics not reported to graphite

2016-09-01 Thread Jack Huang
Found the solution to the follow-up question: https://ci.apache.org/projects/flink/flink-docs-release-1.1/setup/config.html#metrics On Thu, Sep 1, 2016 at 3:46 PM, Jack Huang wrote: > Hi Greg, > > Following your hint, I found the solution here ( > https://issues.apache.org/jira/browse/FLINK-4396

Re: Metrics not reported to graphite

2016-09-01 Thread Jack Huang
Hi Greg, Following your hint, I found the solution here (https://issues.apache.org/ jira/browse/FLINK-4396) and resolved the issue. I had to put all three jars to the lib directory to get it to work. A follow up questions: can I put a prefix (e.g. flink) to all flink metrics instead of having the

Re: Metrics not reported to graphite

2016-09-01 Thread Greg Hogan
Have you copied the required jar files into your lib/ directory? Only JMX support is provided in the distribution. On Thu, Sep 1, 2016 at 5:07 PM, Jack Huang wrote: > Hi all, > > I followed the instruction for reporting metrics to a Graphite server on > the official document (https://ci.apache.o

Metrics not reported to graphite

2016-09-01 Thread Jack Huang
Hi all, I followed the instruction for reporting metrics to a Graphite server on the official document ( https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/metrics.html#metric-types ). Specifically, I have the following config/code in my project metrics.reporters: graphite metrics

Re: Handle deserialization error

2016-09-01 Thread Jack Huang
Hi Yassine, For now my workaround is catching exceptions in my custom deserializer and producing some default object to the downstream. It would still be very nice to avoid this inefficiency by not producing an object at all. Thanks, Jack On Fri, Aug 26, 2016 at 6:51 PM, Yassine Marzougui wrot

Re: Cannot pass objects with null-valued fields to the next operator

2016-09-01 Thread Jack Huang
Hi Stephan, In the end I decided to specify a default value (e.g. empty string) when a field is null. On Mon, Aug 29, 2016 at 11:25 AM, Stephan Ewen wrote: > Hi! > > Null is indeed not supported for some basic data types (tuples / case > classes). > > Can you use Option for nullable fields? > >

Re: Windows and Watermarks Clarification

2016-09-01 Thread Fabian Hueske
A 10 minute tumbling window that starts at 12:00 is evaluated after a watermark is observed that is > 12:10. If the same tumbling window has an allowed lateness of 5 minuted, it is evaluated once a watermark > 12:15 is observed. However, only elements with timestamps 12:00 <= x < 12:10 are in the w

Re: Streaming - memory management

2016-09-01 Thread vinay patil
Hi Fabian, https://ci.apache.org/projects/flink/flink-docs-master/dev/state_backends.html#the-rocksdbstatebackend I am referring to this, this does not clearly state if the state will be maintained in local disk even after checkpointing. Or I am not getting it correclty :) Regards, Vinay Patil

Re: Windows and Watermarks Clarification

2016-09-01 Thread Paul Joireman
Thanks Fabian, This is making more sense. Is allowedLateness(Time.seconds(x)) then evaluated relative to maxEventTime - lastWaterMarkTime. So if (maxEventTime - lastWaterMarkTime) > x * 1000 then the window is evaluated? Paul From: Fabian Hueske Sent: Thu

Re: Streaming - memory management

2016-09-01 Thread Fabian Hueske
You do not have to convert your DTO into a JSON object to use it as a key-value state in a Flink function. You can pass it as it is via the state interfaces. Can you point me to the documentation that you find confusing? The state documentation [1] says: >> You can make *every* transformation (ma

Re: Windows and Watermarks Clarification

2016-09-01 Thread Fabian Hueske
Hi Paul, BoundedOutOfOrdernessTimestampExtractor implements the AssignerWithPeriodicWatermarks interface. This means, Flink will ask the assigner in regular intervals (configurable via StreamExecutionEnvironment.getConfig().setAutoWatermarkInterval()) for the current watermark. The watermark will

Re: Streaming - memory management

2016-09-01 Thread vinay patil
I don't to join the third stream. And Yes, This is what I was thinking of.also : s1.union(s2).keyBy().window().apply(// outerjoin).keyBy.flatMap(// backup join) I am already done integrating with Cassandra but I feel RocksDB will be a better option, I will have to take care of the clearing part

Windows and Watermarks Clarification

2016-09-01 Thread Paul Joireman
Hi all, Just a point of clarification on how watermarks are generated. I'd like to use a SlidingEventTime window of say 5 minutes with a 30 second slide. The incoming data stream has elements from which I can extract the timestamp but they may come out of order so I chose to implement the f

Re: Streaming - memory management

2016-09-01 Thread Fabian Hueske
I thought you would like to join the non-matched elements with another (third) stream. --> s1.union(s2).keyBy().window().apply(// outerjoin).keyBy.connect(s3.keyBy).coFlatMap(// backup join) If you want to match the non-matched stream with itself a FlatMapFunction is the right choice. --> s1.uni

Re: Streaming - memory management

2016-09-01 Thread vinay patil
Yes, that's what I am looking for. But why to use CoFlatMapFunction , I have already got the matchingAndNonMatching Stream , by doing the union of two streams and having the logic in apply method for performing outer-join. I am thinking of applying the same key on matchingAndNonMatching and flatm

Re: Wikiedit QuickStart with Kinesis

2016-09-01 Thread Foster, Craig
Oh, in that case, maybe I should look into using the KCL. I'm just using boto and boto3 which are definitely having different problems but both related to the encoding. boto3 prints *something*: (.96.129.59,-20)'(01:541:4305:C70:10B4:FA8C:3CF9:B9B0,0(Patrick Barlane,0(Nedrutland,12(GreenC bo

Re: Wikiedit QuickStart with Kinesis

2016-09-01 Thread Tzu-Li Tai
I’m afraid the “AUTO” option on the Kinesis producer is actually bugged, so the internally used KPL library correctly pick up credentials with the default credential provider chain. I’ve just filed a JIRA for this: https://issues.apache.org/jira/browse/FLINK-4559

Re: Nested iterations

2016-09-01 Thread Supun Kamburugamuve
Thanks Gabor. I'll keep an eye on the developments. Supun.. On Thu, Sep 1, 2016 at 12:57 PM, Gábor Gévay wrote: > I don't think that there are plans for enabling the nesting of the > native iteration constructs, but we should wait for one of the > commiters to confirm this. > > However, the mat

Re: Nested iterations

2016-09-01 Thread Gábor Gévay
I don't think that there are plans for enabling the nesting of the native iteration constructs, but we should wait for one of the commiters to confirm this. However, the matter of caching of intermediate results has came up on numerous occasions before [1,2,3,4,5], and it would be useful in lots o

Re: Streaming - memory management

2016-09-01 Thread Fabian Hueske
Thanks for the explanation. I think I understood your usecase. Yes, I'd go for the RocksDB approach in a CoFlatMapFunction on a keyed stream (keyed by join key). One input would be the unmatched outer join records, the other input would serve the events you want to match them with. Retrieving elem

Re: Streaming - memory management

2016-09-01 Thread vinay patil
Hi Fabian, I had already used Co-Group function earlier but were getting some issues while dealing with watermarks (for one use case I was not getting the correct result), so I have used the union operator for performing the outer-join (WindowFunction on a keyedStream), this approach is working co

Re: Nested iterations

2016-09-01 Thread Supun Kamburugamuve
Thanks Gabor. I was thinking about starting separate jobs. Is there any plans to support nested loops in the future? Thanks, Supun.. On Thu, Sep 1, 2016 at 12:28 PM, Gábor Gévay wrote: > Hello Supun, > > Unfortunately, nesting of Flink's iteration constructs are not > supported at the moment.

Re: Nested iterations

2016-09-01 Thread Gábor Gévay
Hello Supun, Unfortunately, nesting of Flink's iteration constructs are not supported at the moment. There are some workarounds though: 1. You can start a Flink job for each step of the iteration. Starting a Flink job has some overhead, so this only works if there is a sufficient amount of work

Nested iterations

2016-09-01 Thread Supun Kamburugamuve
Hi, Does Flink support nested iterations? We are trying to develop a complex machine learning algorithm which has 3 iterations nested. Best, Supun..

Re: Streaming - memory management

2016-09-01 Thread Fabian Hueske
Hi Vinay, can you give a bit more detail about how you plan to implement the outer join? Using a WIndowFunction or a CoFlatMapFunction on a KeyedStream? An alternative could be to use a CoGroup operator which collects from two inputs all elements that share a common key (the join key) and are in

Re: Wikiedit QuickStart with Kinesis

2016-09-01 Thread Foster, Craig
Thanks Gordon. I think I changed all my versions to match the version to which I built Kinesis connector, so you were right. That seems to have moved me further. I can write to streams now. Now all I need to do is figure out how Kinesis is encoding it. :) One issue with the "AUTO" option is tha

Re: Streaming - memory management

2016-09-01 Thread vinay patil
Hi Fabian/Stephan, Waiting for your suggestion Regards, Vinay Patil On Wed, Aug 31, 2016 at 1:46 PM, Vinay Patil wrote: > Hi Fabian/Stephan, > > This makes things clear. > > This is the use case I have : > I am performing a outer join operation on the two streams (in window) > after which I ge

Queryable state deactivated on current master branch

2016-09-01 Thread Stefan Richter
Hi, I want to announce that the queryable state feature is deactivated on the current master. In our efforts to implement dynamic scaling for Flink, we introduced key-groups for keyed state (see [FLINK-3755]). However, queryable state does not support key-groups, yet. We will reactivate the fea

Re: NoClassDefFoundError with ElasticsearchSink on Yarn

2016-09-01 Thread aris kol
Classic problem of every uber-jar containing Hadoop dependencies and being deployed on Yarn. What actually happens is that some Hadoop dependency relies on an old version of guava (11 in this case), which doesn't have the method. You may have assembled your fat-jar properly, but because Hadoop

Re: FlinkML ALS matrix factorization: java.io.IOException: Thread 'SortMerger Reading Thread' terminated due to an exception: null

2016-09-01 Thread ANDREA SPINA
Sure. Here you can find the complete logs file. Still can not run through the issue. Thank you for your help. 2016-08-31 18:15 GMT+02:00 Flavio Pompermaier : > I don't know whether my usual error is related to this one but is very >

Re: CountTrigger FIRE or FIRE_AND_PURGE

2016-09-01 Thread Fabian Hueske
Hi Paul, sorry for the delayed reply. I think a CountTrigger won't give you the expected result. When you call trigger() you replace! the existing trigger. In case of a Sliding/TumblingEventTimeWindow, the trigger that fires at the end of the window is replaced by a trigger that fires every 10 el

Re: Wikiedit QuickStart with Kinesis

2016-09-01 Thread Tzu-Li (Gordon) Tai
Hi Craig, I’ve just run a simple test on this and there should be no problem. What Flink version were you using (the archetype version used with the Flink Quickstart Maven Archetype)? Also, on which branch / commit was the Kinesis connector built? Seeing that you’ve used the “AUTO” credentials

Re: NoClassDefFoundError with ElasticsearchSink on Yarn

2016-09-01 Thread Fabian Hueske
Hi Steffen, this looks like a Guava version mismatch to me. Are you running exactly the same program on your local machine or did you add dependencies to run it on the cluster (e.g. Kinesis). Maybe Kinesis and Elasticsearch are using different Guava versions? Best, Fabian 2016-09-01 10:45 GMT+02

NoClassDefFoundError with ElasticsearchSink on Yarn

2016-09-01 Thread Steffen Hausmann
Hi there, I’m running a flink program that reads from a Kinesis stream and eventually writes to an Elasticsearch2 sink. When I’m running the program locally from the IDE, everything seems to work fine, but when I’m executing the same program on an EMR cluster with Yarn, a NoClassDefFoundError

NoClassDefFoundError with ElasticsearchSink on Yarn

2016-09-01 Thread Hausmann, Steffen
Hi there, I’m running a flink program that reads from a Kinesis stream and eventually writes to an Elasticsearch2 sink. When I’m running the program locally from the IDE, everything seems to work fine, but when I’m executing the same program on an EMR cluster with Yarn, a NoClassDefFoundError