RE: Running streaming job on every node of cluster

2017-02-27 Thread Evgeny Kincharov
A lot of thanks, Niko. Really interesting materials. I thought about using 1 Slot per node. But in this case we don’t have possibility to run another jobs in nodes where is running the highload job. And Example 3 from picture at the end of [1] is little bit incorrect. In case parallelism=2 “Task

Flink requesting external web service with rate limited requests

2017-02-27 Thread Giuliano Caliari
Hello, I have an interesting problem that I'm having a hard time modeling on Flink, I'm not sure if it's the right tool for the job. I have a stream of messages in Kafka that I need to group and send them to an external web service but I have some concerns that need to be addressed: 1. Rate Li

Re: Checkpointing with RocksDB as statebackend

2017-02-27 Thread Seth Wiesman
Vinay, The bucketing sink performs rename operations during the checkpoint and if it tries to rename a file that is not yet consistent that would cause a FileNotFound exception which would fail the checkpoint. Stephan, Currently my aws fork contains some very specific assumptions about the pi

Evicting elements in EventTimeSessionWindow

2017-02-27 Thread Fritz Budiyanto
Hi All, How do I evict elements from EventTimeSessionWindow ? My use case as follow: I have a long duration session window, and I’d like to do some processing on every minute and perform regular sink. I use ContinuousEventTimeTrigger to do the job, as the session could last for hours (or even

Re: Checkpointing with RocksDB as statebackend

2017-02-27 Thread vinay patil
Hi Seth, Thank you for your suggestion. But if the issue is only related to S3, then why does this happen when I replace the S3 sink to HDFS as well (for checkpointing I am using HDFS only ) Stephan, Another issue I see is when I set env.setBufferTimeout(-1) , and keep the checkpoint interval t

Re: Running streaming job on every node of cluster

2017-02-27 Thread Nico Kruber
this may also be a good read: https://ci.apache.org/projects/flink/flink-docs-release-1.2/concepts/ runtime.html#task-slots-and-resources On Monday, 27 February 2017 18:40:48 CET Nico Kruber wrote: > What about setting the parallelism[1] to the total number of slots in your > cluster? > By defaul

Re: Flink the right tool for the job ? Huge Data window lateness

2017-02-27 Thread Stephan Ewen
Also FYI: Current work includes incremental checkpointing so that large state checkpoints require less bandwidth and storage. On Mon, Feb 27, 2017 at 5:53 PM, Aljoscha Krettek wrote: > Hi, > just to throw in my 2 cents: if your window operations don't require that > all elements are kept as the

Re: Running streaming job on every node of cluster

2017-02-27 Thread Nico Kruber
What about setting the parallelism[1] to the total number of slots in your cluster? By default, all parts of your program are put into the same slot sharing group[2] and by setting the parallelism you would have this slot (with your whole program) in each parallel slot as well (minus/plus operat

Re: Thread safety

2017-02-27 Thread Stephan Ewen
Flink does not share objects between threads of different slots/tasks. An operator sometimes uses multiple threads (main thread, checkpoint thread, timer thread), but these are locked against each other, there is never concurrent access. One thing to be aware of is that one thread can execute mul

Re: Checkpointing with RocksDB as statebackend

2017-02-27 Thread Stephan Ewen
Hi Seth! Wow, that is an awesome approach. We have actually seen these issues as well and we are looking to eventually implement our own S3 file system (and circumvent Hadoop's S3 connector that Flink currently relies on): https://issues.apache.org/jira/browse/FLINK-5706 Do you think your patch

Re: Sliding Windows Processing problem with Kafka Queue Event Time and BoundedOutOfOrdernessGenerator

2017-02-27 Thread Nico Kruber
Hi Sujit, actually, according to https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/ windows.html#allowed-lateness the sliding window should fire each time for each element arriving late. Did you set the following for your window operator? .window() .allowedLateness() The exp

Thread safety

2017-02-27 Thread Mohit Anchlia
Trying to understand what parts of flink have thread safety built in them. Key question is, are the objects created in flink shared between threads (slots)? For eg: if I create a sink function and open a file is that shared between threads?

RE: Running streaming job on every node of cluster

2017-02-27 Thread Evgeny Kincharov
Thanks for your answer. The problem is that both slots are seized in the one node. Of course if this node has enough free slots. Another nodes idle. I want to utilize cluster resource little bit more. May be the other deployment modes allow it. BR, Evgeny. От: Nico Kruber

Re: Connecting workflows in batch

2017-02-27 Thread Mohit Anchlia
What's the best way to track the progress of the job? On Mon, Feb 27, 2017 at 7:56 AM, Aljoscha Krettek wrote: > Hi Mohit, > I'm afraid there is nothing like this in Flink yet. As you mentioned you > probably have to manually track the completion of one job and then trigger > execution of the ne

Re: Flink the right tool for the job ? Huge Data window lateness

2017-02-27 Thread Aljoscha Krettek
Hi, just to throw in my 2 cents: if your window operations don't require that all elements are kept as they are you can greatly reduce your state size by using a ReduceFunction on your window. With this, the state size would essentially become * * . Best, Aljoscha On Sun, 26 Feb 2017 at 14:16 T

Re: List State in RichWindowFunction leads to RocksDb memory leak

2017-02-27 Thread Aljoscha Krettek
I created this Jira issue for per-window state: https://issues.apache.org/jira/browse/FLINK-5929 Would that suit your needs? If yes, please comment on the issue. I think it would be a very nice addition that opens up a lot of possibilities. Regarding access to the job id in the RuntimeContext, I t

Re: Checkpointing with RocksDB as statebackend

2017-02-27 Thread Seth Wiesman
Just wanted to throw in my 2cts. I’ve been running pipelines with similar state size using rocksdb which externalize to S3 and bucket to S3. I was getting stalls like this and ended up tracing the problem to S3 and the bucketing sink. The solution was two fold: 1) I forked hadoop-aws and

Re: Running streaming job on every node of cluster

2017-02-27 Thread Nico Kruber
Hi Evgeny, I tried to reproduce your example with the following code, having another console listening with "nc -l 12345" env.setParallelism(2); env.addSource(new SocketTextStreamFunction("localhost", 12345, " ", 3)) .map(new MapFunction() { @Override

Re: Compilation Error in WindowStream.fold()

2017-02-27 Thread Aljoscha Krettek
It seems the type of your initial accumulator, which is Map[EWayCoordinates,Set[VehicleID]], does not match the accumulator type on your FoldFunction, which is Map[EWayCoordinates,Set[Int]]. Could you change that? On Sat, 25 Feb 2017 at 04:09 nsengupta wrote: > Hello Aljoscha, > > Many thanks fo

Re: Connecting workflows in batch

2017-02-27 Thread Aljoscha Krettek
Hi Mohit, I'm afraid there is nothing like this in Flink yet. As you mentioned you probably have to manually track the completion of one job and then trigger execution of the next one. Best, Aljoscha On Fri, 24 Feb 2017 at 19:16 Mohit Anchlia wrote: > Is there a way to connect 2 workflows such

Sliding Windows Processing problem with Kafka Queue Event Time and BoundedOutOfOrdernessGenerator

2017-02-27 Thread Sujit Sakre
Hi, Hope you are well. We have encountered an issue in processing sliding windows. Here we have encountered the problem that if the last record is outside of the sliding window end time then it does not process the record till the next sliding window is completely occupied and gets triggered. Pl

Re: Set custom file as ClassPath in flink yarn command

2017-02-27 Thread lining jing
flink run -m yarn-cluster -yn 10 -yjm 1024 -ytm 2048 -c mainClass jarPath

Re: ElasticsearchSink Exception

2017-02-27 Thread Tzu-Li (Gordon) Tai
Hi! Like wha Flavio suggested, at a first glance this looks like a problem with building the uber jar. I haven’t bumped into the problem while testing out the connector on cluster submitted test jobs before, but I can try to test this quickly to make sure. Could you tell me what your installed

Set custom file as ClassPath in flink yarn command

2017-02-27 Thread raikarsunil
Hi Folks, Is there any way where we can set classpath value in the below command similar to java -cp? flink run -m yarn-cluster -yst -yn 2 -ynm APPNAME -yd JARFILE ArgsForJAR java -cp JARFILE;CustomFileLocation MainClass Args Thanks, Sunil -- View this message in context: http://apache-flin

Re: Fw: Flink Kinesis Connector

2017-02-27 Thread Matt
Hi, >Am I missing something obvious? So it was that! Thanks very much for the help, sure I'll be able to figure that out. Matt From: Tzu-Li (Gordon) Tai Sent: 27 February 2017 12:17 To: user@flink.apache.org Subject: Re: Fw: Flink Kinesis Connector Hi Matt!

Running streaming job on every node of cluster

2017-02-27 Thread Evgeny Kincharov
Hi, I have the simplest streaming job, and I want to distribute my job on every node of my Flink cluster. Job is simple: source (SocketTextStream) -> map -> sink (AsyncHtttpSink). When I increase parallelism of my job when deploying or directly in code, no effect because source is can't work

Re: Fw: Flink Kinesis Connector

2017-02-27 Thread Tzu-Li (Gordon) Tai
Hi Matt! As mentioned in the docs, due to the ASL license, we do not deploy the artifact to the Maven central repository on Flink releases. You will need to build the Kinesis connector by yourself (the instructions to do so are also in the Flink Kinesis connector docs :)), and install it to your

Fw: Flink Kinesis Connector

2017-02-27 Thread Matt
Hi, I'm working through trying to connect flink up to a kinesis stream, off of this: https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/connectors/kinesis.html Apache Flink 1.2.0 Documentation: Amazon AWS Kinesis ...