event time and late events - documentation

2018-07-16 Thread Sofer, Tovi
Hi group, Can someone please elaborate on the comment at the end of section "Debugging Windows & Event Time"? Didn't understand it meaning. https://ci.apache.org/projects/flink/flink-docs-master/monitoring/debugging_event_time.html "Handling Event Time Stragglers Approach 1: Watermark stays late (

FW: high availability with automated disaster recovery using zookeeper

2018-07-16 Thread Sofer, Tovi
Thank you Scott, Looks like a very elegant solution. How did you manage high availability in single data center? Thanks, Tovi From: Scott Kidder Sent: יום ו 13 יולי 2018 01:13 To: Sofer, Tovi [ICG-IT] Cc: user@flink.apache.org Subject: Re: high availability with automated disaster recovery

RE: high availability with automated disaster recovery using zookeeper

2018-07-10 Thread Sofer, Tovi
/projects/flink/flink-docs-release-1.5/ops/deployment/mesos.html [Not sure this is accurate, since it seems to contradict the image in link below https://mesosphere.com/blog/apache-flink-on-dcos-and-apache-mesos ] From: Sofer, Tovi [ICG-IT] Sent: יום ג 10 יולי 2018 20:04 To: 'Till Rohrmann'

RE: high availability with automated disaster recovery using zookeeper

2018-07-10 Thread Sofer, Tovi
-disaster-recovery/ ) Is this supported by Flink cluster on Mesos ? Thanks again Tovi From: Till Rohrmann Sent: יום ג 10 יולי 2018 10:11 To: Sofer, Tovi [ICG-IT] Cc: user Subject: Re: high availability with automated disaster recovery using zookeeper Hi Tovi, that is an interesting use case

high availability with automated disaster recovery using zookeeper

2018-07-09 Thread Sofer, Tovi
Hi all, We are now examining how to achieve high availability for Flink, and to support also automatic recovery in disaster scenario- when all DC goes down. We have DC1 which we usually want work to be done, and DC2 - which is more remote and we want work to go there only when DC1 is down. We e

mongodb connector

2018-05-22 Thread Sofer, Tovi
Hi group, Is there a ready to use mongo DB sink? I found online this closed issue of bahir-flink, but didn't find some version of this code to use\download... https://issues.apache.org/jira/browse/FLINK-6573 https://issues.apache.org/jira/browse/BAHIR-133 Thanks, Tovi

RE: kafka as recovery only source

2018-02-07 Thread Sofer, Tovi
Hi Fabian, Thank you for the suggestion. We will consider it. Would be glad to hear other ideas how to handle such requirement. Thanks again, Tovi From: Fabian Hueske [mailto:fhue...@gmail.com] Sent: יום ד 07 פברואר 2018 11:47 To: Sofer, Tovi [ICG-IT] Cc: user@flink.apache.org; Tzu-Li (Gordon

kafka as recovery only source

2018-02-06 Thread Sofer, Tovi
Hi group, I wanted to get your suggestion on how to implement two requirements we have: * One is to read from external message queue (JMS) at very fast latency * Second is to support zero data loss, so that in case of restart and recovery, messages not checkpointed (and not part

RE: Sync and Async checkpoint time

2018-01-31 Thread Sofer, Tovi
To: Sofer, Tovi [ICG-IT] Cc: user@flink.apache.org Subject: Re: Sync and Async checkpoint time Hi, this looks like the timer service is the culprit for this problem. Timers are currently not stored in the state backend, but in a separate on-heap data structure that does not support copy-on

Sync and Async checkpoint time

2018-01-30 Thread Sofer, Tovi
Hi group, In our project we are using asynchronous FSStateBackend, and we are trying to move to distributed storage - currently S3. When using this storage we are experiencing issues of high backpressure and high latency, in comparison of local storage. We are trying to understand the reason, s

RE: Two operators consuming from same stream

2018-01-04 Thread Sofer, Tovi
ect(pricesKeyedStream).flatMap(identityMapper); ds.flatMap(mapperA); ds.flatMap(mapperB); Regards, Timo Am 1/1/18 um 2:50 PM schrieb Sofer, Tovi : Hi group, We have the following graph below, on which we added metrics for latency calculation. We have two streams which are consumed by two opera

Two operators consuming from same stream

2018-01-01 Thread Sofer, Tovi
Hi group, We have the following graph below, on which we added metrics for latency calculation. We have two streams which are consumed by two operators: * ordersStream and pricesStream - they are both consumed by two operators: CoMapperA and CoMapperB, each using connect. Initially we

RE: slot group indication per operator

2017-12-11 Thread Sofer, Tovi
behavior. @Chesnay: Do you know if we can get this information? If not through the Web UI, maybe via REST? Do we have access to the full ExecutionGraph somewhere? Otherwise it might make sense to open an issue for this. Regards, Timo Am 12/5/17 um 4:25 PM schrieb Sofer, Tovi : Hi all, I am

RE: Testing CoFlatMap correctness

2017-12-10 Thread Sofer, Tovi
such scenario so that results are predictable, and that elements from main stream arrive after elements from control stream, or other way around. Thanks again, Tovi From: Kostas Kloudas [mailto:k.klou...@data-artisans.com] Sent: יום ה 07 דצמבר 2017 19:11 To: Sofer, Tovi [ICG-IT] Cc: user

Testing CoFlatMap correctness

2017-12-07 Thread Sofer, Tovi
Hi group, What is the best practice for testing CoFlatMap operator correctness? We have two source functions, each emits data to stream, and a connect between them, and I want to make sure that when streamA element arrive before stream element, a certain behavior happens. How can I test this cas

slot group indication per operator

2017-12-05 Thread Sofer, Tovi
Hi all, I am trying to use the slot group feature, by having 'default' group and additional 'market' group. The purpose is to divide the resources equally between two sources and their following operators. I've set the slotGroup on the source of the market data. Can I assume that all following op

RE: Negative values using latency marker

2017-11-05 Thread Sofer, Tovi
Hi Nico, Actually the run below is on my local machine, and both Kafka and flink run on it. Thanks and regards, Tovi -Original Message- From: Nico Kruber [mailto:n...@data-artisans.com] Sent: יום ו 03 נובמבר 2017 15:22 To: user@flink.apache.org Cc: Sofer, Tovi [ICG-IT] Subject: Re

Negative values using latency marker

2017-11-02 Thread Sofer, Tovi
Hi group, Can someone maybe elaborate how can latency gauge shown by latency marker be negative? 2017-11-02 18:54:56,842 INFO com.citi.artemis.flink.reporters.ArtemisReporter - [Flink-MetricRegistry-1] 127.0.0.1.taskmanager.f40ca57104c8642218e5b8979a380ee4.Flink Streaming Job.Sink: FinalSink

RE: state size effects latency

2017-10-30 Thread Sofer, Tovi
From: Biplob Biswas [mailto:revolutioni...@gmail.com] Sent: יום ב 30 אוקטובר 2017 11:02 To: Sofer, Tovi [ICG-IT] Cc: Narendra Joshi ; user Subject: Re: state size effects latency Hi Tovi, This might seem a really naive question (and its neither a solution or answer to your question ) but I am

RE: state size effects latency

2017-10-30 Thread Sofer, Tovi
Thank you Joshi. We are using currently FsStateBackend since in version 1.3 it supports async snapshots, and no RocksDB. Does anyone else has feedback on this issues? From: Narendra Joshi [mailto:narendr...@gmail.com] Sent: יום א 29 אוקטובר 2017 12:13 To: Sofer, Tovi [ICG-IT] Cc: user Subject

state size effects latency

2017-10-29 Thread Sofer, Tovi
Hi all, In our application we have a requirement to very low latency, preferably less than 5ms. We were able to achieve this so far, but when we start increasing the state size, we see distinctive decrease in latency. We have added MinPauseBetweenCheckpoints, and are using async snapshots. *

RE: kafka consumer parallelism

2017-10-03 Thread Sofer, Tovi
Hi Robert, I had similar issue. For me the problem was that the topic was auto created with one partition. You can alter it to have 5 partitions using kafka-topics command. Example: kafka-topics --alter --partitions 5 --topic fix --zookeeper localhost:2181 Regards, Tovi -Original Message-

RE: Flink kafka consumer that read from two partitions in local mode

2017-09-26 Thread Sofer, Tovi
-up question – is it possible to create the topic with two partitions while creating the FlinkKafKaProducer? Since by default it seems to create it with one partition. Thanks and regards, Tovi From: Sofer, Tovi [ICG-IT] Sent: יום ב 25 ספטמבר 2017 17:18 To: 'Tzu-Li (Gordon) Tai'; Fabian

RE: Flink kafka consumer that read from two partitions in local mode

2017-09-25 Thread Sofer, Tovi
Producer (2/2) to produce into default topic fix Thanks, Tovi From: Tzu-Li (Gordon) Tai [mailto:tzuli...@apache.org] Sent: יום ב 25 ספטמבר 2017 15:06 To: Sofer, Tovi [ICG-IT]; Fabian Hueske Cc: user Subject: RE: Flink kafka consumer that read from two partitions in local mode Hi Tovi, Your

RE: Flink kafka consumer that read from two partitions in local mode

2017-09-24 Thread Sofer, Tovi
Thank you Fabian. Fabian, Gordon, am I missing something in consumer setup? Should I configure consumer in some way to subscribe to two partitions? Thanks and regards, Tovi From: Fabian Hueske [mailto:fhue...@gmail.com] Sent: יום ג 19 ספטמבר 2017 22:58 To: Sofer, Tovi [ICG-IT] Cc: user; Tzu-Li

Flink kafka consumer that read from two partitions in local mode

2017-09-19 Thread Sofer, Tovi
Hi, I am trying to setup FlinkKafkaConsumer which reads from two partitions in local mode, using setParallelism=2. The producer writes to two partition (as it is shown in metrics report). But the consumer seems to read always from one partition only. Am I missing something in partition configura