Re: Re: Bootstrapping the state

2018-07-22 Thread Henri Heiskanen
Hi, With state bootstrapping I mean loading the state with initial data before starting the actual job. For example, in our case I would like to load information like registration date of our users (>5 years of data) so that I can enrich our event data in streaming (5 days retention). Before watc

回复:Network PartitionNotFoundException when run on multi nodes

2018-07-22 Thread Zhijiang(wangzhijiang999)
Hi Steffen, This exception indicates that when the downstream task requests partition from the upstream task, the upstream task has not initialized to register its result partition. In this case, the downstream task will inquire the state from job manager, and then retry to request partition fr

Re: Production readiness of Flink Job Stop Service

2018-07-22 Thread vino yang
Hi Chirag, Actually, if you want to use "stop/resume/pause" pattern, you can trigger a savepoint before stopping job, then you can resume your job with specify this savepoint. The savepoint does not bind with cancel command, although you can only invoke "cancel with savepoint" now. Now you mentio

Re: ProcessFunction example from the documentation giving me error

2018-07-22 Thread vino yang
Hi anna, >From the stack trace you provided, it's socket connect error not about Flink. So, Have you start a socket server at "localhost:"? Using a program or CLI tool, such as "nc -l " There is a example you can have a look[1]. [1]: https://ci.apache.org/projects/flink/flink-docs-relea

Re: Production readiness of Flink Job Stop Service

2018-07-22 Thread Chirag Dewan
Thanks a lot Fabian and Vino. Is there anyway I can do that without stop? Although the plan should be to use a Two Phase commit sink so that there are no duplicates, I am looking for stopping the sources before the savepoint is triggered. Thanks, Chirag Sent from Yahoo Mail on Android On Th

Re: streaming predictions

2018-07-22 Thread Xingcan Cui
Hi Cederic, If the model is a simple function, you can just load it and make predictions using the map/flatMap function in the StreamEnvironment. But I’m afraid the model trained by Flink-ML should be a “batch job", whose predict method takes a Dataset as the parameter and outputs another Datas

Re: Triggers for late elements

2018-07-22 Thread Hequn Cheng
Hi harshvardhan, No, by default, late elements will be thrown away. There are documents about window here[1]. Best, Hequn [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/operators/windows.html#windows On Mon, Jul 23, 2018 at 1:34 AM, Harshvardhan Agrawal < harshvardhan.ag

Re: Behaviour of triggers in Flink

2018-07-22 Thread Hequn Cheng
Hi Harshvardhan, By specifying a trigger using trigger() you are overwriting the default trigger of a WindowAssigner. For example, if you specify a CountTrigger for TumblingEventTimeWindows you will no longer get window firings based on the progress of time but only by count. Right now, you have t

Events can overtake watermarks

2018-07-22 Thread Gyula Fóra
Hi, In 1.5.1 I have noticed some strange behaviour that happens quite frequently and I just want to double check with you that this is intended. If I have a non-parallel source that takes the following actions: emit: event1 emit: watermark1 emit: event2 it can happen that a downstream operators

Kinesis Producer in 1.4.2: testing locally with Kinesalite not working

2018-07-22 Thread Philipp Bussche
Hi there, when trying to use a KinesisProducer which has both aws.region and aws.endpoint set I am receiving the following error message: 07/22/2018 19:52:42 Source: EventStream -> (Sink: Unnamed, EventAndVenueMap -> (Filter, Sink: EventToInventorySink), Sink: ElasticSearchEventsSink)(4/8) sw

Triggers for late elements

2018-07-22 Thread Harshvardhan Agrawal
Hello, I am trying to understand the behaviour of Triggers in the case where we receive late elements for a window. Does Flink always fires a window each time it receives late element i.e. if I receive 10 late elements, would it fire 10 times? Is there any specific example I could refer to underst

Permissions to delete Checkpoint on cancel

2018-07-22 Thread Ashish Pokharel
All, We recently moved our Checkpoint directory from HDFS to local SSDs mounted on Data Nodes (we were starting to see perf impacts on checkpoints etc as complex ML apps were spinning up more and more in YARN). This worked great other than the fact that when jobs are being canceled or canceled

Re: IoT Use Case, Problem and Thoughts

2018-07-22 Thread Ashish Pokharel
Till, Fabian, Looping back after a gap on this, for some reason this looks like a need very specific to us (I would have thought this would be of interest to others as well). We on-boarded one of our new IoT data sources and our total checkpoints right now are over 1TB and checkpoint period is

Re: Memory Leak in ProcessingTimeSessionWindow

2018-07-22 Thread Ashish Pokharel
One more attempt to get some feedback on this. It basically boils down to using High-Level Window API in scenarios where keys are unbounded / infinite but can be expired after certain time. From what we have observed (solution 2 below), some properties of keys are still in state (guessing key it

Behaviour of triggers in Flink

2018-07-22 Thread Harshvardhan Agrawal
Hi, I have been trying to understand how triggers work in Flink. We have a set of data that arrives to us on Kafka. We need to process the data in a window when either one of the two criteria satisfy: 1) Max number of elements has reached in the window. 2) Some max time has elapsed (Say 5 millisec

Re: Flink mini IDEA runtime

2018-07-22 Thread Pavel Ciorba
Thanks Jorn! 2018-07-22 13:53 GMT+03:00 Jörn Franke : > You can run them in a localenvironment. I do it for my integration tests > everytime: > flinkEnvironment = ExecutionEnvironment.createLocalEnvironment(1) > > Eg (even together with a local HDFS cluster) https://github.com/ > ZuInnoTe/hadoopc

Re: Network PartitionNotFoundException when run on multi nodes

2018-07-22 Thread zhangminglei
Hi, Steffen You can take a look on this https://github.com/apache/flink/pull/6103 . Hopes can help! Cheers Minglei > 在 2018年7月22日,下午10:22,Steffen Wohlers 写道: > > Hi all, > > I have some problems when running my application on more than one Task > M

Network PartitionNotFoundException when run on multi nodes

2018-07-22 Thread Steffen Wohlers
Hi all, I have some problems when running my application on more than one Task Manager. setup: node1: Job Manager, Task Manager node2: Task Manager I can run my program successfully on each node alone when I stop the other Task Manager. But when I start both and set parallelism = 2, every time

Re: streaming predictions

2018-07-22 Thread Hequn Cheng
Hi Cederic, I am not familiar with SVM or machine learning but I think we can work it out together. What problem have you met when you try to implement this function? From my point of view, we can rebuild the model in the flatMap function and use it to predict the input data. There are some flatMa

Re: 1.5.1

2018-07-22 Thread Vishal Santoshi
According to the UI it seems that " org.apache.flink.util.FlinkException: The assigned slot 208af709ef7be2d2dfc028ba3bbf4600_10 was removed. " was the cause of a pipe restart. As to the TM it is an artifact of the new job allocation regime which will exhaust all slots on a TM rather then distrib

Re: 1.5.1

2018-07-22 Thread Gary Yao
Hi, The first exception should be only logged on info level. It's expected to see this exception when a TaskManager unregisters from the ResourceManager. Heartbeats can be configured via heartbeat.interval and hearbeat.timeout [1]. The default timeout is 50s, which should be a generous value. It

Re: Flink mini IDEA runtime

2018-07-22 Thread Jörn Franke
You can run them in a localenvironment. I do it for my integration tests everytime: flinkEnvironment = ExecutionEnvironment.createLocalEnvironment(1) Eg (even together with a local HDFS cluster) https://github.com/ZuInnoTe/hadoopcryptoledger/blob/master/examples/scala-flink-ethereumblock/src/it/

Flink mini IDEA runtime

2018-07-22 Thread Pavel Ciorba
Hi all! >From what I know, Flink jobs can be run straight from the IDE because IDEA will create a mini Flink runtime. What is the underlying CLI command that Jetbarins IDEA issues to run a Flink job in a mini-runtime? My use case is that I want to see if the job written using the SQL API is vali

streaming predictions

2018-07-22 Thread Cederic Bosmans
Dear My name is Cederic Bosmans and I am a masters student at the Ghent University (Belgium). I am currently working on my masters dissertation which involves Apache Flink. I want to make predictions in the streaming environment based on a model trained in the batch environment. I trained my SVM