Ask for reason for choice of S3 plugins

2020-03-26 Thread B.Zhou
Hi, In this document https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/s3.html#hadooppresto-s3-file-systems-plugins, it mentioned that * Presto is the recommended file system for checkpointing to S3. Is there a reason for that? Is there some bottleneck for s3 hadoop plu

Re: Emit message at start and end of event time session window

2020-03-26 Thread Manas Kale
Hi Till, Thank you for the explanation, I understand the behaviour now. On Thu, Mar 26, 2020 at 9:23 PM Till Rohrmann wrote: > A quick update concerning your observations. The reason why you are seeing > the unordered output is because in the gist we used > a AssignerWithPeriodicWatermarks whic

Re: Issue with Could not resolve ResourceManager address akka.tcp://flink

2020-03-26 Thread Zhu Zhu
Hi Vitaliy, >> *Cannot serve slot request, no ResourceManager connected* This is not a problem, just that the JM needs RM to be connected to send slot requests. >> *Could not resolve ResourceManager address akka.tcp://flink@prod-bigd-dn11:43757/user/resourcemanager* This should be the root cause.

Re: How to move event time forward using externally generated watermark message

2020-03-26 Thread Manas Kale
Thanks for the help, Arvid! On Tue, Mar 24, 2020 at 1:30 AM Arvid Heise wrote: > Hi Manas, > > both are valid options. > > I'd probably add a processing time timeout event in a process function, > which will only trigger after no event has been received after 1 minute. In > this way, you don't n

Re: usae of ClusterSpecificationBuilder.taskManagerMemoryMB

2020-03-26 Thread Vitaliy Semochkin
Got it, thank you very much for the reply. So far we can not avoid using ClusterSpecification because clusterDescriptor.deployJobCluster(clusterSpecification, jobGraph... ) depends on it. Best Regards, Vitaliy On Tue, Mar 24, 2020 at 5:24 AM Xintong Song wrote: > Hi Vitality, > > After FLIP-49

Issue with Could not resolve ResourceManager address akka.tcp://flink

2020-03-26 Thread Vitaliy Semochkin
Hi, I'm facing an issue similar to https://issues.apache.org/jira/browse/FLINK-14074 Job starts and then yarn logs report "*Could not resolve ResourceManager address akka.tcp://flink*" A fragment from yarn logs looks like this: LazyFromSourcesSchedulingStrategy] 16:54:21,279 INFO org.apache.fli

Re: Emit message at start and end of event time session window

2020-03-26 Thread Till Rohrmann
A quick update concerning your observations. The reason why you are seeing the unordered output is because in the gist we used a AssignerWithPeriodicWatermarks which generates watermarks periodically. Due to this aspect, it can happen that Flink already process all elements up to "20" before it see

Re: Emit message at start and end of event time session window

2020-03-26 Thread Till Rohrmann
Hmm, I might have given you a bad advice. I think the problem becomes harder because with Flink's window and trigger API we need to keep state consistent between the Trigger and the Window function. Maybe it would be easier to not rely on the windowing mechanism and instead to use Flink's process f

Re: savepoint - checkpoint - directory

2020-03-26 Thread Yun Tang
Hi Fanbin To resume from checkpoint, you should provide at least the directory named as /path/chk-x or /path/chk-x/_metadata. The sub-dir named as “shared” is used to store incremental checkpoint content. You could refer to [1] for more information. BTW, stop with savepoint could help reduce

Re: Emit message at start and end of event time session window

2020-03-26 Thread Manas Kale
Hi Till, I see, thanks for the clarification. Assuming all other setting are the same, if I generate events as follows : Element.from("1", 1000L), Element.from("2", 2000L), Element.from("3", 3000L), Element.from("10", 1L) ,Element.

Re: Emit message at start and end of event time session window

2020-03-26 Thread Till Rohrmann
Hi Manas, the problem is that the print() statement is being executed with a different parallelism than 1. Due to this fact, the messages coming from the window function will be sent in round-robin fashion to the print operators. If you remove the setParallelism(1) from the window function, then t

Re: How to consume kafka from the last offset?

2020-03-26 Thread Jim Chen
Hi, I am so sorry. It's not auto.offset.reset. Correctly, it is *enable.auto.commit=false* Best Wishs! Dominik Wosiński 于2020年3月26日周四 下午4:20写道: > Hey, > Are You completely sure you mean *auto.offset.reset ?? *False is not > valid setting for that AFAIK. > > Best, > Dom. > > czw., 26 mar 2020 o

Re: How to consume kafka from the last offset?

2020-03-26 Thread Dominik Wosiński
Hey, Are You completely sure you mean *auto.offset.reset ?? *False is not valid setting for that AFAIK. Best, Dom. czw., 26 mar 2020 o 08:38 Jim Chen napisał(a): > Thanks! > > I made a mistake. I forget to set the auto.offset.reset=false. It's my > fault. > > Dominik Wosiński 于2020年3月25日周三 下午

Re: When i use the Tumbling Windows, find lost some record

2020-03-26 Thread Dawid Wysakowicz
Hi, Can you share more details what do you mean that you loose some records? Can you share what data are you ingesting what are the expected results and what are the actual results you are getting. Without that it's impossible to help you. So far your code looks rather correct. Best, Dawid On 2

When i use the Tumbling Windows, find lost some record

2020-03-26 Thread Jim Chen
Hi, All When i use the Tumbling Windows, find lost some record. My code as follow *env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);* *env.addSource(FlinkKafkaConsumer011..)* *.assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor(Time.minutes(3)) {

Re: How to consume kafka from the last offset?

2020-03-26 Thread Jim Chen
Thanks! I made a mistake. I forget to set the auto.offset.reset=false. It's my fault. Dominik Wosiński 于2020年3月25日周三 下午6:49写道: > Hi Jim, > Well, *auto.offset.reset *is only used when there is no offset saved for > this *group.id * in Kafka. So, if You want to read the > data fr