If I chain two windows, what event-time would the second window have?

2016-07-26 Thread Yassin Marzouki
Hi all, Say I assign timestamps to a stream and then apply a transformation like this: stream.keyBy(0).timeWindow(Time.hours(5)).reduce(count).timeWindowAll(Time.days(1)).apply(transformation) Now, when the first window is applied, events are aggregated based on their timestamps, but I don't und

CEP and Within Clause

2016-07-26 Thread Sameer W
Hi, It looks like the WithIn clause of CEP uses Tumbling Windows. I could get it to use Sliding windows by using an upstream pipeline which uses Sliding Windows and produces repeating elements (in each sliding window) and applying a Watermark assigner on the resulting stream with elements duplicat

Re: .so linkage error in Cluster

2016-07-26 Thread Debaditya Roy
Hi, For the error I get this when I run the .jar made by mvn clean package java.lang.NoClassDefFoundError: org/bytedeco/javacpp/opencv_core$Mat at loc.video.Job.main(Job.java:29) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.inv

stop then start the job, how to load latest checkpoints automatically?

2016-07-26 Thread Shaosu Liu
I want to load previous states and I understand I could do this with specifying a savepoints. Is there a way to do this automatically, given I do not change my code (jar)? -- Cheers, Shaosu

Re: .so linkage error in Cluster

2016-07-26 Thread Ufuk Celebi
Out of curiosity I've tried this locally by adding the following dependencies to my Maven project: org.bytedeco javacpp 1.2.2 org.bytedeco.javacpp-presets opencv 3.1.0-1.2 With this, running mvn clean package works as expected. On Tue, Jul 26, 2016 at 7:09 PM, Ufuk Celebi

Re: .so linkage error in Cluster

2016-07-26 Thread Ufuk Celebi
What error message to you get from Maven? On Tue, Jul 26, 2016 at 4:39 PM, Debaditya Roy wrote: > Hello, > > I am using the jar builder from IntelliJ IDE (the mvn one was causing > problems). After that I executed it successfully locally. But in remote it > is causing problem. > > Warm Regards, >

Re: Performance issues with GroupBy?

2016-07-26 Thread Greg Hogan
Hi Robert, Are you able to simplify the your function input / output types? Flink aggressively serializes the data stream and complex types such as ArrayList and BitSet will be much slower to process. Are you able to reconstruct the lists to be groupings on elements? Greg On Mon, Jul 25, 2016 at

Re: .so linkage error in Cluster

2016-07-26 Thread Debaditya Roy
Hello, I am using the jar builder from IntelliJ IDE (the mvn one was causing problems). After that I executed it successfully locally. But in remote it is causing problem. Warm Regards, Debaditya On Tue, Jul 26, 2016 at 4:36 PM, Ufuk Celebi wrote: > Yes, the BlobCache on each TaskManager node

Re: .so linkage error in Cluster

2016-07-26 Thread Ufuk Celebi
Yes, the BlobCache on each TaskManager node should fetch it from the JobManager. How are you packaging your JAR? On Tue, Jul 26, 2016 at 4:32 PM, Debaditya Roy wrote: > Hello users, > > I am having a problem while running my flink program in a cluster. It gives > me an error that it is unable to

.so linkage error in Cluster

2016-07-26 Thread Debaditya Roy
Hello users, I am having a problem while running my flink program in a cluster. It gives me an error that it is unable to find an .so file in a tmp directory. Caused by: java.lang.UnsatisfiedLinkError: no jniopencv_core in java.library.path at java.lang.ClassLoader.loadLibrary(ClassLoader.jav

Re: Supporting Multi-tenancy in a Flink

2016-07-26 Thread Gyula Fóra
Hi, Well, what we are doing at King is trying to solve a similar problem. It would be great if you could read the blogpost because it goes into detail about the actual implementation but let me recap here quickly: We are building a stream processing system that data scientists and other developer

Re: Supporting Multi-tenancy in a Flink

2016-07-26 Thread Aparup Banerjee (apbanerj)
Thanks. Hi Gyula anything , you can share on this? Aparup On 7/26/16, 4:38 AM, "Ufuk Celebi" wrote: >On Mon, Jul 25, 2016 at 5:38 AM, Aparup Banerjee (apbanerj) > wrote: >> We are building a Stream processing system using Apache beam on top of Flink >> using the Flink Runner. Our pipelines

Re: Question about Checkpoint Storage (RocksDB)

2016-07-26 Thread Sameer W
Thank you. That clears it up. I meant SavePoints. Sorry I used the term Snapshots in its place :-). Thanks, Sameer On Tue, Jul 26, 2016 at 8:33 AM, Ufuk Celebi wrote: > On Tue, Jul 26, 2016 at 2:15 PM, Sameer W wrote: > > 1. Calling clear() on the KV state is only possible for snapshots right

Re: Question about Checkpoint Storage (RocksDB)

2016-07-26 Thread Ufuk Celebi
On Tue, Jul 26, 2016 at 2:15 PM, Sameer W wrote: > 1. Calling clear() on the KV state is only possible for snapshots right? Do > you control that for checkpoints too. What do you mean with snapshots vs. checkpoints exactly? > 2. Assuming that the user has no control over the checkpoint process o

Re: Question about Checkpoint Storage (RocksDB)

2016-07-26 Thread Sameer W
Thanks Ufuk, That was very helpful. But that raised a few more questions :-): 1. Calling clear() on the KV state is only possible for snapshots right? Do you control that for checkpoints too. 2. Assuming that the user has no control over the checkpoint process outside of controlling the checkpoi

Re: flink batch data processing

2016-07-26 Thread Ufuk Celebi
Are you using the DataSet or DataStream API? Yes, most Flink transformations operate on single tuples, but you can work around it: - You could write a custom source function, which emits records that contain X points (https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/index.html#dat

Re: Question about Apache Flink Use Case

2016-07-26 Thread Kostas Kloudas
Hi Suma Cherukuri, I also replied to your question in the dev list, but I repeat the answer here just in case you missed in. From what I understand you have many small files and you want to aggregate them into bigger ones containing the logs of the last 24h. As Max said RollingSinks will allow

Re: Supporting Multi-tenancy in a Flink

2016-07-26 Thread Ufuk Celebi
On Mon, Jul 25, 2016 at 5:38 AM, Aparup Banerjee (apbanerj) wrote: > We are building a Stream processing system using Apache beam on top of Flink > using the Flink Runner. Our pipelines take Kafka streams as sources , and > can write to multiple sinks. The system needs to be tenant aware. Tenants

Re: dynamic streams and patterns

2016-07-26 Thread Ufuk Celebi
On Mon, Jul 25, 2016 at 10:09 AM, Claudia Wegmann wrote: > To 3) Would an approach similar to King/RBEA even be possible combined with > Flink CEP? As I understand, Patterns have to be defined in Java code and > therefore have to be recompiled? Do I overlook something important? Pulling in Till (

Re: Question about Checkpoint Storage (RocksDB)

2016-07-26 Thread Ufuk Celebi
On Mon, Jul 25, 2016 at 8:50 PM, Sameer W wrote: > The question is, if using really long windows (in hours) if the state of the > window gets very large over time, would size of the RocksDB get larger? > Would replication to HDFS start causing performance bottlenecks? Also would > this need a cons

tumbling time window, date boundary and timezone

2016-07-26 Thread Hironori Ogibayashi
Hello, I want to calculate daily access count using Flink streaming. Flink's TumblingProcessingTimeWindow assigns events to windows of 00:00 GMT to 23:59 GMT each day, but I live in Japan (GMT+09:00) and want date boundaries to be 09:00 GMT (00:00 JST). Do I have to implement my own WindowAssigner

Re: Performance issues with GroupBy?

2016-07-26 Thread Ufuk Celebi
+1 to what Gavor said. The hash combine will be part of the upcoming 1.1. release, too. This could be further amplified by the blocking intermediate results, which have a very simplistic implementation writing out many different files, which can lead to a lot of random I/O. – Ufuk On Tue, Jul 26

Re: Performance issues with GroupBy?

2016-07-26 Thread Gábor Gévay
Hello Robert, > Is there something I might could do to optimize the grouping? You can try to make your `RichGroupReduceFunction` implement the `GroupCombineFunction` interface, so that Flink can do combining before the shuffle, which might significantly reduce the network load. (How much the comb