Re: Recovery problem 1 of 2 in Flink 1.6.3

2019-01-14 Thread John Stone
rigger, CoalesceAndDecorateWindowedGenericRecordProcessWindowFunction) (1/16) (9320e1ac8143dce9ef827d2bea2d274e). From: John Stone Date: Thursday, January 10, 2019 at 3:31 PM To: "user@flink.apache.org" Subject: Recovery problem 1 of 2 in Flink 1.6.3 This is the first of two recovery problems I'm seeing running Flink 1.6.3 in Kubernetes.

Recovery problem 2 of 2 in Flink 1.6.3

2019-01-10 Thread John Stone
s.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1030) ... 24 more 2019-01-10 20:03:40,192 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Job streamProcessorJob (83d7cb3e6d08318ef2c27878d0fe1bbd) switched from state RUNNING to FAILING. Many thanks, John Stone

Recovery problem 1 of 2 in Flink 1.6.3

2019-01-10 Thread John Stone
ruction Kafka Consumer -> Filter -> Filter -> Map (1/1) of job c44a91b76ea99ead6fdf9b13a98c15bb is not in state RUNNING but SCHEDULED instead. Aborting checkpoint. Many thanks, John Stone

Re: TaskManagers cannot contact JobManager in Kubernetes when JobManager HA is enabled

2018-11-01 Thread John Stone
I've managed to resolve the issue. With HA enabled, you will see this message in the logs: 2018-11-01 13:38:52,467 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Actor system started at akka.tcp://flink@flink-jobmanager:40641 Without HA enabled, you will see this messag

TaskManagers cannot contact JobManager in Kubernetes when JobManager HA is enabled

2018-10-31 Thread John Stone
I have successfully managed to deploy a Flink cluster in Kubernetes without JobManager high availability. Everything works great. The moment I enable high availability, TaskManagers fail to contact the JobManager. My configurations and logs are below. Can someone point me in the correct dir

Job fails to restore from checkpoint in Kubernetes with FileNotFoundException

2018-10-29 Thread John Stone
I am testing Flink in a Kubernetes cluster and am finding that a job gets caught in a recovery loop. Logs show that the issue is that a checkpoint cannot be found although checkpoints are being taken per the Flink web UI. Any advice on how to resolve this is most appreciated. Note on below: I

Re: Potential bug in Flink SQL HOP and TUMBLE operators

2018-09-18 Thread John Stone
Fabian, I believe so, yes. Many thanks, John

Re: Potential bug in Flink SQL HOP and TUMBLE operators

2018-09-18 Thread John Stone
Thank you all for your assistance. I believe I've found the root cause if the behavior I am seeing. If I just use "SELECT * FROM MyEventTable" (Fabian's question), I find that events received in the first 3 seconds are ignored as opposed to the original 5. What I'm seeing seems to suggest that

Potential bug in Flink SQL HOP and TUMBLE operators

2018-09-17 Thread John Stone
Hello, I'm checking if this is intentional or a bug in Apache Flink SQL (Flink 1.6.0). I am using processing time with a RocksDB backend. I have not checked if this issue is also occurring in the Table API. I have not checked if this issue also exists for event time (although I suspect it does)