Re: Windowing isn't applied per key

2017-10-11 Thread mclendenin
Hi Tony, In the documentation on keyed windows vs non-keyed it says that it will split the stream into parallel keyed streams with windows being executed in parallel across the keys. I would think that this would mean that each key has it's own window managed independently. https://ci.apache.org/

Re: Windowing isn't applied per key

2017-10-10 Thread mclendenin
Sure, I'm going to use a name as key in this example and just a number as the value aggregated. This is the sample input data 12:00 {"name": "Marcus", "value": 1} 12:01 {"name": "Suzy", "value": 2} 12:03 {"name": "Alicia", "value": 3} 12:04 {"name": "Ben", "value": 1} 12:06 {"name": "Alicia", "val

Re: Windowing isn't applied per key

2017-10-09 Thread mclendenin
I am using Processing Time, so it is using the default timestamps and watermarks. I am running it with a parallelism of 3, I can see each operator running at a parallelism of 3 on the Web UI. I am pulling data from a Kafka topic with 12 partitions. -- Sent from: http://apache-flink-user-mailing

Re: RocksDB error with flink 1.2.0

2017-05-05 Thread mclendenin
I ended up combining all the patterns into one giant CEP pattern and then filtering the output of that pattern instead. That way it was only one RocksDB instance which led to large checkpoints instead of lots of small checkpoints. This seems to work, but I still need to do more testing around wheth

Re: RocksDB error with flink 1.2.0

2017-04-28 Thread mclendenin
There are only 3 nodes in the HDFS cluster and when running fsck it shows the filesystem as healthy. $ hdfs fsck /user/hadoop/flink/checkpoints/dc2aee563bebce76e420029525c37892/chk-43/ 17/04/28 16:24:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using bui

Re: RocksDB error with flink 1.2.0

2017-04-28 Thread mclendenin
The top level exception is similar to one on this Jira issue but the root Exception is different. This one says it was fixed in 1.2.0 which is what I'm using https://issues.apache.org/jira/browse/FLINK-5663 -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050

Re: RocksDB error with flink 1.2.0

2017-04-28 Thread mclendenin
This is the stacktrace I'm getting when checkpointing to the HDFS. It happens like once every 3 checkpoints and I don't see this without parallelism. AsynchronousException{java.lang.Exception: Could not materialize checkpoint 6 for operator KeyedCEPPatternOperator -> Flat Map -> Map -> Sink: Unna

RocksDB error with flink 1.2.0

2017-04-28 Thread mclendenin
Starting ~300 CEP patterns with parallelism of 6 since there are 6 partitions on a kafka topic. Checkpoint using rocksDB to Hadoop on interval of 50 seconds. Cluster is HA with 2 JM and 5 TM. Getting following exception : java.io.IOException: Error creating ColumnFamilyHandle. at org.a

Re: Multiple CEP Patterns

2017-04-28 Thread mclendenin
Ok, I will try using Flink 1.3 -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Multiple-CEP-Patterns-tp12871p12896.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Multiple CEP Patterns

2017-04-28 Thread mclendenin
I do have a within clause on all the patterns and I am doing CEP.pattern on each one. On the output I am adding a Kafka sink. Since all the patterns are going to the same sink I was wondering if there was a better way to do it rather then having that overhead. For the memory issues with 1.2, I do

Multiple CEP Patterns

2017-04-27 Thread mclendenin
I'm trying to run multiple independent CEP patterns. They're basic patterns, just one input followed by another and my flink job runs fine when just using 1 pattern. If i try to scale this up to add multiple CEP patterns, 200 for example, I start getting memory errors on my cluster. I can definitel