Re: Duplicates in self join

2018-10-08 Thread Dominik Wosiński
Hey, IMHO, the simplest way in your case would be to use the Evictor to evict duplicate values after the window is generated. Have look at it here: https://ci.apache.org/projects/flink/flink-docs-release-1.6/api/java/org/apache/flink/streaming/api/windowing/evictors/Evictor.html Best Regards, Domi

Re: [DISCUSS] Dropping flink-storm?

2018-10-08 Thread Chesnay Schepler
I've created https://issues.apache.org/jira/browse/FLINK-10509 for removing flink-storm. On 28.09.2018 15:22, Till Rohrmann wrote: Hi everyone, I would like to discuss how to proceed with Flink's storm compatibility layer flink-strom. While working on removing Flink's legacy mode, I noticed t

Re: Flink Python streaming

2018-10-08 Thread Chesnay Schepler
Hello, to use libraries you have to supply them when submitting the job as described below. Additional directories/files will be placed in the same directory as your script on each TM. See https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/python.html#executing-plans Note th

Re: [DISCUSS] Breaking the Scala API for Scala 2.12 Support

2018-10-08 Thread Chesnay Schepler
I'd rather not maintain 2 master branches. Beyond the maintenance overhead I'm wondering about the benefit, as the API break still has to happen at some point. @Aljoscha how much work for supporting scala 2.12 can be merged without breaking the API? If this is the only blocker I suggest to ma

回复:Memory Allocate/Deallocate related Thread Deadlock encountered when running a large job > 10k tasks

2018-10-08 Thread Zhijiang(wangzhijiang999)
There actually exists this deadlock for special scenarios. Before fixing the bug, we can avoid this issue by not deploying the map and sink tasks in the same task manager to work around. Krishna, do you share the slot for these two tasks? If so, you can set disable slot sharing for this job. Or

what's the meaning of latency indicator reported by flink metrics through prometheus?

2018-10-08 Thread varuy322
Hi there, I have integrated kafka with flink1.5, also with the help of prometheus and Granada to display metrics of flink 1.5, now i get three indicator about latency as below: 1)flink_taskmanager_job_latency_source_id_source_subtask_index_operator_id_operator_subtask_index_latency(short for laten

Re: ***UNCHECKED*** Error while confirming Checkpoint

2018-10-08 Thread PedroMrChaves
Hello, Find attached the jobmanager.log. I've omitted the log lines from other runs, only left the job manager info and the run with the error. jobmanager.log Thanks again for your help. Regards

Re: ***UNCHECKED*** Error while confirming Checkpoint

2018-10-08 Thread Stefan Richter
Hi Pedro, unfortunately the interesting parts are all removed from the log, we already know about the exception itself. In particular, what I would like to see is what checkpoints have been triggered and completed before the exception happens. Best, Stefan > Am 08.10.2018 um 10:23 schrieb Pedr

Re: [DISCUSS] Dropping flink-storm?

2018-10-08 Thread Till Rohrmann
Thanks for opening the issue Chesnay. I think the overall consensus is to drop flink-storm and only keep the Bolt and Spout wrappers. Thanks for your feedback! Cheers, Till On Mon, Oct 8, 2018 at 9:37 AM Chesnay Schepler wrote: > I've created https://issues.apache.org/jira/browse/FLINK-10509 fo

Re: what's the meaning of latency indicator reported by flink metrics through prometheus?

2018-10-08 Thread Chesnay Schepler
1) correct 2) is the number of measurements; due to the random distribution of latency markers this value can be surprisingly low depending on the latency marker interval 3) I don't know, but it isn't exposed by Flink. On 08.10.2018 10:17, varuy322 wrote: Hi there, I have integrated kafka wi

Re: [DISCUSS] Dropping flink-storm?

2018-10-08 Thread Chesnay Schepler
I don't believe that to be the consensus. For starters it is contradictory; we can't /drop /flink-storm yet still /keep //some parts/. From my understanding we drop flink-storm completely, and put a note in the docs that the bolt/spout wrappers of previous versions will continue to work. On

Re: [DISCUSS] Dropping flink-storm?

2018-10-08 Thread Till Rohrmann
Good point. The initial idea of this thread was to remove the storm compatibility layer completely. During the discussion I realized that it might be useful for our users to not completely remove it in one go. Instead for those who still want to use some Bolt and Spout code in Flink, it could be n

Re: [DISCUSS] Breaking the Scala API for Scala 2.12 Support

2018-10-08 Thread Aljoscha Krettek
I have an open PR that does everything we can do for preparing the code base for Scala 2.12 without breaking the API: https://github.com/apache/flink/pull/6784 > On 8. Oct 2018, at 09:56, Chesnay Schepler wrote: > > I'd rather not maintain 2 master branches. Beyond the maintenance overhead I'm

Job manager logs for previous YARN attempts

2018-10-08 Thread Pawel Bartoszek
Hi, I am looking into the cause YARN starts new application attempt on Flink 1.5.2. The challenge is getting the logs for the first attempt. After checking YARN I discovered that in the first attempt and the second one application manager (job manager) gets assigned the same container id (is this

Re: [DISCUSS] Breaking the Scala API for Scala 2.12 Support

2018-10-08 Thread Chesnay Schepler
And the remaining parts would only be about breaking the API? On 08.10.2018 12:24, Aljoscha Krettek wrote: I have an open PR that does everything we can do for preparing the code base for Scala 2.12 without breaking the API: https://github.com/apache/flink/pull/6784 On 8. Oct 2018, at 09:56,

Re: error in using kafka in flink

2018-10-08 Thread Kostas Kloudas
Hi Marzieh, This is because of a mismatch between your Kafka version and the one your job assumes (0.8). You should use an older Kafka version (0.8) for the job to run out-of-the-box or update your job to use FlinkKafkaProducer011. Cheers, Kostas > On Oct 6, 2018, at 2:13 PM, marzieh ghasem

Re: flink memory management / temp-io dir question

2018-10-08 Thread Kostas Kloudas
Hi Anand, I think that Till is the best person to answer your question. Cheers, Kostas > On Oct 5, 2018, at 3:44 PM, anand.gopin...@ubs.com wrote: > > Hi , > I had a question with respect flink memory management / overspill to /tmp. > > In the docs > (https://ci.apache.org/projects/flink/fl

Re: flink memory management / temp-io dir question

2018-10-08 Thread Kostas Kloudas
Sorry, I forgot to cc’ Till. > On Oct 8, 2018, at 2:17 PM, Kostas Kloudas > wrote: > > Hi Anand, > > I think that Till is the best person to answer your question. > > Cheers, > Kostas > >> On Oct 5, 2018, at 3:44 PM, anand.gopin...@ubs.com >> wrote: >> >> Hi

Re: error in using kafka in flink

2018-10-08 Thread marzieh ghasemi
Hello Thank you very much, but I imported "FlinkKafkaProducer09" and changed "FlinkKafkaProducer08" to it. Then problem solved. On Mon, Oct 8, 2018 at 3:39 PM Kostas Kloudas wrote: > Hi Marzieh, > > This is because of a mismatch between your Kafka version > and the one your job assumes (0.8). >

Re: flink memory management / temp-io dir question

2018-10-08 Thread Till Rohrmann
Hi Anand, spilling using the io directories is only relevant for Flink's batch processing. This happens, for example if you enable blocking data exchange where the produced data cannot be kept in memory. Moreover, it is used by many of Flink's out-of-core data structures to enable exactly this fea

Re: [DISCUSS] Breaking the Scala API for Scala 2.12 Support

2018-10-08 Thread Aljoscha Krettek
Breaking the API (or not breaking it but requiring explicit types when using Scala 2.12) and the Maven infrastructure to actually build a 2.12 release. > On 8. Oct 2018, at 13:00, Chesnay Schepler wrote: > > And the remaining parts would only be about breaking the API? > > On 08.10.2018 12:24,

Re: Using several Kerberos keytabs in standalone cluster

2018-10-08 Thread Aljoscha Krettek
Hi Olga, I think right now this is not possible because we piggybag on the YARN shipment functionality for shipping the Keytab along with the TaskManagers. I think changing this would require somewhat bigger changes because loading the Keytab happens when the TaskManagers are brought up. Best,

Re: [DISCUSS] Breaking the Scala API for Scala 2.12 Support

2018-10-08 Thread Chesnay Schepler
The infrastructure would only be required if we opt for releasing 2.11 and 2.12 builds simultaneously, correct? On 08.10.2018 15:04, Aljoscha Krettek wrote: Breaking the API (or not breaking it but requiring explicit types when using Scala 2.12) and the Maven infrastructure to actually build a

Re: Duplicates in self join

2018-10-08 Thread Hequn Cheng
Hi Eric, Can you change Sliding window to Tumbling window? The data of different sliding window are likely overlap. Best, Hequn On Mon, Oct 8, 2018 at 3:35 PM Dominik Wosiński wrote: > Hey, > IMHO, the simplest way in your case would be to use the Evictor to evict > duplicate values after the

Re: Duplicates in self join

2018-10-08 Thread Eric L Goodman
If I change it to a Tumbling window some of the results will be lost since the pattern I'm matching has a temporal extent, so if the pattern starts in one tumbling window and ends in the next, it won't be reported. Based on the temporal length of the query, you can set the sliding window and the w

Re: Duplicates in self join

2018-10-08 Thread Fabian Hueske
Did you check the new interval join that was added with Flink 1.6.0 [1]? It might be better suited because, each record has its own boundaries based on its timestamp and the join window interval. Best, Fabian [1] https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/operators/joi

Re: [DISCUSS] Breaking the Scala API for Scala 2.12 Support

2018-10-08 Thread Aljoscha Krettek
Yes, but I think we would pretty much have to do that. I don't think we can stop doing 2.11 releases. > On 8. Oct 2018, at 15:37, Chesnay Schepler wrote: > > The infrastructure would only be required if we opt for releasing 2.11 and > 2.12 builds simultaneously, correct? > > On 08.10.2018 15:

Watermark through Rest Api

2018-10-08 Thread Gregory Fee
Hello! I am interested in getting the current low watermark for tasks in my Flink jobs. I know how to see them in the UI. I'm interested in getting them programmatically, hopefully via rest api. The documentation says that they are exposed as metrics but I don't see watermark info in the 'metrics'

Re: Watermark through Rest Api

2018-10-08 Thread Piotr Nowojski
Hi, Watermarks are tracked per Task/Operator level: https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#io Tracking watermarks on the job level would be problematic, since it would r