Re: Early events

2016-11-19 Thread Juan Rodríguez Hortalá
Hi, There was a bug in my code, I was assigning the timestamps wrong and that is why it looked like early events where assigned processing time. Surprisingly enought my test works both ok with early events. In fact I have modified my test data generator to generate early events or late events, and

Early events

2016-11-19 Thread Juan Rodríguez Hortalá
Hi, Maybe this is already in the documentation, sorry if I'm asking something obvious. I was thinking that if you have event time then you can also have early events, which would be events whose extracted timestampt is in the future. This might happen in practice for example in sensors with a skew

ContinuousEventTimeTrigger breaks coGrouped windowed streams?

2016-11-19 Thread William Saar
Hi! My topology below seems to work when I comment out all the lines with ContinuousEventTimeTrigger, but prints nothing when the line is in there. Can I coGroup two large time windows that use a different trigger time than the window size? (even if the ContinuousEventTimeTrigger doesn't

Re: Flink streaming with 1+ TB of managed state

2016-11-19 Thread Gyula Fóra
Hi Steven, As Robert said some of our jobs have state sizes around a TB or more. We use the RocksDB state backend with some configs tuned to perform well on SSDs (you can get some tips here: https://www.youtube.com/watch?v=pvUqbIeoPzM). We checkpoint our state to Ceph (similar to HDFS but this is

Re: Running the JobManager and TaskManager on the same node in a cluster

2016-11-19 Thread Robert Metzger
Hi Dominik, Your observation is right, running the JobManager and TaskManager on the same node is no problem. If that machine fails, both services will be affected, but as long as you have infrastructure in place (YARN for example) to start them somewhere else, nothing bad will happen. Regarding

Re: RDF/SPARQL and Flink

2016-11-19 Thread Robert Metzger
Hi Tomas, I'm really not an RDF processing expert, but since nobody responded for 4 days, I'll try to give you some pointers: I know that there've been discussions regarding RDF processing on this mailing list before. Check out this one for example: http://apache-flink-user-mailing-list-archive.23

Re: Flink streaming with 1+ TB of managed state

2016-11-19 Thread Robert Metzger
Hi Steven, According to this presentation, King.com is using Flink with terabytes of state: http://flink-forward.org/wp-content/uploads/2016/07/Gyulo-Fo%CC%81ra-RBEA-Scalable-Real-Time-Analytics-at-King.compressed.pdf (see Page 4 specifically) For the 90GB experiment, what is the expected time fo

Re: flink-dist shading

2016-11-19 Thread Robert Metzger
Hi Craig, I also received only this email (and I'm a moderator of the dev@ list, so the message never made it into Apache's infra) When this issue was first reported [1][2] I asked on the Maven mailing list what's going on [3]. I think this JIRA contains the most information on the issue: https://

Streaming program gets stucks when accesing to AWS Kinesis

2016-11-19 Thread IvanFernandez
Hi, I have written a program that connect to the example stock tickers stream on AWS Kinesis and filters out those related to tech sector. I have tried on my local machine running `sbt run' an everything seems OK. Then I have moved to AWS EMR (emr-5.1.0). I've installed the Flink distribution ins