Re: Issue with back pressure and AsyncFunction

2017-11-10 Thread Ufuk Celebi
Hey Ken, thanks for your message. Both your comments are correct (see inline). On Fri, Nov 10, 2017 at 10:31 PM, Ken Krugler wrote: > 1. A downstream function in the iteration was (significantly) increasing the > number of tuples - it would get one in, and sometimes emit 100+. > > The output wou

Issue with back pressure and AsyncFunction

2017-11-10 Thread Ken Krugler
Hi all, I was debugging a curious problem with a streaming job that contained an iteration and several AsynFunctions. The entire job would stall out, with no progress being made. But when I checked back pressure, only one function showed it as being high - everything else was OK. And when I d

Re: Docker-Flink Project: TaskManagers can't talk to JobManager if they are on different nodes

2017-11-10 Thread Vergilio, Thalita
Hi Til, Thank you very much for that. And thanks for your help. I have finally managed to get the multi-cloud setup on Docker Swarm working by tweaking the Flink image slightly to set these configuration options to known values. I have also used the Weave Net Docker plugin to create my cross-c

Re: Streaming : a way to "key by partition id" without redispatching data

2017-11-10 Thread Derek VerLee
I was about to ask this question myself.  I find myself re-keying by the same keys repeatedly.  I think in principle you could always just roll more work into one window operation with a more complex series of maps/folds/windowfunctions or processfunction.  Howeve

RE: Streaming : a way to "key by partition id" without redispatching data

2017-11-10 Thread Gwenhael Pasquiers
I think I finally found a way to "simulate" a Timer thanks to the the processWatermark function of the AbstractStreamOperator. Sorry for the monologue. From: Gwenhael Pasquiers [mailto:gwenhael.pasqui...@ericsson.com] Sent: vendredi 10 novembre 2017 16:02 To: 'user@flink.apache.org' Subject: RE

Re: Flink memory leak

2017-11-10 Thread Piotr Nowojski
I have a couple of concerns. 1. Your logs seems to be incomplete. There are for example missing at the beginning configuration output (see attached example log). Also output file seems strange to me (like duplicated log file). Please submit full logs. 2. If your heap size is 1.5GB, how is it po

Re: Queryable State Python

2017-11-10 Thread Kostas Kloudas
Hi Martin, I will try to reply to your questions inline: > On Nov 10, 2017, at 1:59 PM, Martin Eden wrote: > > Hi, > > Our team is looking at replacing Redis with Flink's own queryable state > mechanism. However our clients are using python. > > 1. Is there a python integration with the Flin

RE: Streaming : a way to "key by partition id" without redispatching data

2017-11-10 Thread Gwenhael Pasquiers
Hello, Finally, even after creating my operator, I still get the error : "Timers can only be used on keyed operators". Isn't there any way around this ? A way to "key" my stream without shuffling the data ? From: Gwenhael Pasquiers Sent: vendredi 10 novembre 2017 11:42 To: Gwenhael Pasquiers ;

Re: Flink memory leak

2017-11-10 Thread ÇETİNKAYA EBRU ÇETİNKAYA EBRU
On 2017-11-10 17:50, Piotr Nowojski wrote: I do not see anything abnormal in the logs before this error :( What are your JVM settings and which java version are you running? What happens if you limit the heap size so that the swap is never used? Piotrek On 10 Nov 2017, at 14:57, ÇETİNKAYA EBRU

Re: Flink memory leak

2017-11-10 Thread Piotr Nowojski
I do not see anything abnormal in the logs before this error :( What are your JVM settings and which java version are you running? What happens if you limit the heap size so that the swap is never used? Piotrek > On 10 Nov 2017, at 14:57, ÇETİNKAYA EBRU ÇETİNKAYA EBRU > wrote: > > On 2017-1

Queryable State Python

2017-11-10 Thread Martin Eden
Hi, Our team is looking at replacing Redis with Flink's own queryable state mechanism. However our clients are using python. 1. Is there a python integration with the Flink queryable state mechanism? Cannot seem to be able to find one. 2. If not, is it on the roadmap? 3. Our current solution is

Re: Correlation between data streams/operators and threads

2017-11-10 Thread Piotr Nowojski
1. It’s a little bit more complicated then that. Each operator chain/task will be executed in separate thread (parallelism Multiplies that). You can check in web ui how was your job split into tasks. 3. Yes that’s true, this is an issue. To preserve the individual watermarks/latencies (assumin

Re: Docker-Flink Project: TaskManagers can't talk to JobManager if they are on different nodes

2017-11-10 Thread Till Rohrmann
Hi Thalita, yes you can use the mentioned configuration parameters to set the ports for the TaskManager and the BlobServer. However, you must make sure that there is at most one TM running on a host, otherwise you run into port collisions. For taskmanager.rpc.port and blob.server.port you can defi

readFile, DataStream

2017-11-10 Thread Juan Miguel Cejuela
Hi there, I’m trying to watch a directory for new incoming files (with StreamExecutionEnvironment#readFile) with a subsecond latency (interval watch of ~100ms, and using the flag FileProcessingMode.PROCESS_CONTINUOUSLY ). If many files come in within (under) the interval watching time, flink does

Re: Testing / Configuring event windows with Table API and SQL

2017-11-10 Thread Fabian Hueske
Hi Colin, Flink's SQL runner does not support handling of late data yet. At the moment, late events are simply dropped. We plan to add support for late data in a future release. The "withIdleStateRetentionTime" parameter only applies to non-windowed aggregation functions and controls when they ca

Re: Docker-Flink Project: TaskManagers can't talk to JobManager if they are on different nodes

2017-11-10 Thread Vergilio, Thalita
Hi All, I just wanted to let you know that I have finally managed to get the multi-cloud setup working!! I honestly can't believe my eyes. I used a Docker plugin called Weave to create the Swarm network, a public external IP address for each node and opened a range of ports, and I can now get

RE: Streaming : a way to "key by partition id" without redispatching data

2017-11-10 Thread Gwenhael Pasquiers
Maybe you don't need to bother with that question. I'm currently discovering AbstractStreamOperator, OneInputStreamOperator and Triggerable. That should do it :-) From: Gwenhael Pasquiers [mailto:gwenhael.pasqui...@ericsson.com] Sent: jeudi 9 novembre 2017 18:00 To: 'user@flink.apache.org' Sub

Re: Flink memory leak

2017-11-10 Thread Piotr Nowojski
jobmanager1.log and taskmanager2.log are the same. Can you also submit files containing std output? Piotrek > On 10 Nov 2017, at 09:35, ÇETİNKAYA EBRU ÇETİNKAYA EBRU > wrote: > > On 2017-11-10 11:04, Piotr Nowojski wrote: >> Hi, >> Thanks for the logs, however I do not see before mentioned ex

Re: Flink memory leak

2017-11-10 Thread Piotr Nowojski
Hi, Thanks for the logs, however I do not see before mentioned exceptions in it. It ends with java.lang.InterruptedException Is it the correct log file? Also, could you attach the std output file of the failing TaskManager? Piotrek > On 10 Nov 2017, at 08:42, ÇETİNKAYA EBRU ÇETİNKAYA EBRU >