Flink job repeated restart failure

2021-03-24 Thread VINAYA KUMAR BENDI
Dear all, One of the Flink jobs gave below exception and failed. Several attempts to restart the job resulted in the same exception and the job failed each time. The job started successfully only after changing the file name. Flink Version: 1.11.2 Exception 2021-03-24 20:13:09,288 INFO org.ap

Re: EOFException on attempt to scale up job with RocksDB state backend

2021-03-24 Thread Alexey Trenikhun
Hi Yun, Finally I was able to try to rescale with block blobs configured - rescaled from 6 to 8 w/o problem. So loos like indeed there is problem with page blob. Thank you for help, Alexey From: Alexey Trenikhun Sent: Thursday, March 18, 2021 11:31 PM To: Yun Tan

Re: Re: About Memory Spilling to Disk in Flink

2021-03-24 Thread Guowei Ma
Hi, Roc Thanks for your detailed explanation. I could not find any "stream" operator that uses `ExternalSorterBuilder` by "find usage" of the IDEA. Best, Guowei On Wed, Mar 24, 2021 at 3:27 PM Roc Marshal wrote: > Hi, Guowei Ma. > As far as I know, flink writes some in-memory data to disk w

Re: [EXTERNAL] Re: PyFlink DataStream Example Kafka/Kinesis?

2021-03-24 Thread Bohinski, Kevin
Hi Shuiqiang, Thanks for letting me know. Feel free to send any beginner level contributions for this effort my way 😊 . Best, kevin From: Shuiqiang Chen Date: Wednesday, March 24, 2021 at 10:31 PM To: "Bohinski, Kevin" Cc: user Subject: [EXTERNAL] Re: PyFlink DataStream Example Kafka/Kinesis

Re: PyFlink DataStream Example Kafka/Kinesis?

2021-03-24 Thread Shuiqiang Chen
Sure, no problem. You can refer to the implementation of Kafka connector, they are very much alike. Xinbin Huang 于2021年3月25日周四 上午10:55写道: > Hi Shuiqiang, > > Thanks for the quick response on creating the ticket for Kinesis > Connector. Do you mind giving me the chance to try to implement the >

Re: With the checkpoint interval of the same size, the Flink 1.12 version of the job checkpoint time-consuming increase and production failure, the Flink1.9 job is running normally

2021-03-24 Thread Haihang Jing
Hi,Congxian ,thanks for your replay. job run on Flink1.9 (checkpoint interval 3min) job run on Flink1.12 (checkpoint interval 10min)

Re: PyFlink DataStream Example Kafka/Kinesis?

2021-03-24 Thread Shuiqiang Chen
I have just created the jira https://issues.apache.org/jira/browse/FLINK-21966 and will finish it soon. Best, Shuiqiang Xinbin Huang 于2021年3月25日周四 上午10:43写道: > Hi Shuiqiang, > > I am interested in the same feature. Do we have a ticket to track this > right now? > > Best > Bin > > On Wed, Mar 24

Re: With the checkpoint interval of the same size, the Flink 1.12 version of the job checkpoint time-consuming increase and production failure, the Flink1.9 job is running normally

2021-03-24 Thread Haihang Jing
Hi,Congxian ,thanks for your replay. job run on Flink1.9 (checkpoint interval 3min) job run on Flink1.12 (checkpoint interval 10min)

Re: PyFlink DataStream Example Kafka/Kinesis?

2021-03-24 Thread Shuiqiang Chen
Hi Kevin, Kinesis connector is not supported yet in Python DataStream API. We will add it in the future. Best, Shuiqiang Bohinski, Kevin 于2021年3月25日周四 上午5:03写道: > Is there a kinesis example? > > > > *From: *"Bohinski, Kevin" > *Date: *Wednesday, March 24, 2021 at 4:40 PM > *To: *"Bohinski, Ke

State size increasing exponentially in Flink v1.9

2021-03-24 Thread Almeida, Julius
Hey, Hope you all are doing well! I am using flink v1.9 with RocksDBStateBackend, but over time the state size is increasing exponentially. I am using MapState in my project & seeing memory spike, after looking at heap dump I see duplicates in it. I also have logic added to remove expired even

Re: PyFlink DataStream Example Kafka/Kinesis?

2021-03-24 Thread Bohinski, Kevin
Is there a kinesis example? From: "Bohinski, Kevin" Date: Wednesday, March 24, 2021 at 4:40 PM To: "Bohinski, Kevin" Subject: Re: PyFlink DataStream Example Kafka/Kinesis? Nevermind, found this for anyone else looking: https://github.com/apache/flink/blob/master/flink-end-to-end-tests/flink-py

Re: PyFlink DataStream Example Kafka/Kinesis?

2021-03-24 Thread Bohinski, Kevin
Nevermind, found this for anyone else looking: https://github.com/apache/flink/blob/master/flink-end-to-end-tests/flink-python-test/python/datastream/data_stream_job.py From: "Bohinski, Kevin" Date: Wednesday, March 24, 2021 at 4:38 PM To: user Subject: PyFlink DataStream Example Kafka/Kinesis

PyFlink DataStream Example Kafka/Kinesis?

2021-03-24 Thread Bohinski, Kevin
Hi, Is there an example kafka/kinesis source or sink for the PyFlink DataStream API? Best, kevin

Re: Evenly distribute task slots across task-manager

2021-03-24 Thread Vignesh Ramesh
Hi Matthias, Thanks for your reply. In my case, yes the upstream operator for the operator which is not distributed evenly among task managers is a flink Kafka connector with a rebalance(shuffling). Regards, Vignesh On Tue, 23 Mar, 2021, 6:48 pm Matthias Pohl, wrote: > There was a similar disc

Re: DataDog and Flink

2021-03-24 Thread Vishal Santoshi
yep, not a single EP that does all the dump but something like this works ( dirty but who cares :)) .. The vertex metrics are the most numerous any way ```curl -s http:///jobs/[job_id] | jq -r '.vertices' | jq '.[].id' | xargs -I {} curl http://xx/jobs/[job_id]/vertices/{}/metrics | jq

Re: Kubernetes HA - attempting to restore from wrong (non-existing) savepoint

2021-03-24 Thread Alexey Trenikhun
Hi Yang, HA data was cleared, but it would be re-created when Kubernetes will restart failed (due to code 2) job. So upgrade will happen on life job. I guess upgrade procedure, should recheck or monitor kubernetes job to ensure that it is completed Thanks, Alexey _

Re: OOM issues with Python Objects

2021-03-24 Thread Kevin Lam
Hi Dian, I have unit tests for which both sets of code (Row subclass vs. custom Python class) passes. The OOM occurs when reading a large amount of data from a kafka topic. At the moment I don't have a simple example to reproduce the issue, I'll let you know. On Tue, Mar 23, 2021 at 2:17 AM Dian

Native kubernetes execution and History server

2021-03-24 Thread Lukáš Drbal
Hi, I would like to use native kubernetes execution [1] for one batch job and let scheduling on kubernetes. Flink version: 1.12.2. Kubernetes job: apiVersion: batch/v1beta1 kind: CronJob metadata: name: scheduled-job spec: schedule: "*/1 * * * *" jobTemplate: spec: template:

Re: DataDog and Flink

2021-03-24 Thread Vishal Santoshi
Yes, I will do that. Regarding the metrics dump through REST, it does provide for the TM specific but refuses to do it for all jobs and vertices/operators etc .Moreover I am not sure I have access to the vertices ( vertex_id ) readily from the UI. curl http://[jm]/taskmanagers/[tm_id] curl http:

Re: flink sql jmh failure

2021-03-24 Thread jie mei
Hi, Yik San I use a library wroten by myself and trying to verify the performance. Yik San Chan 于2021年3月24日周三 下午9:07写道: > Hi Jie, > > I am curious what library do you use to get the ClickHouseTableBuilder > > On Wed, Mar 24, 2021 at 8:41 PM jie mei wrote: > >> Hi, Community >> >> I run a jmh

Re: flink sql jmh failure

2021-03-24 Thread Yik San Chan
Hi Jie, I am curious what library do you use to get the ClickHouseTableBuilder On Wed, Mar 24, 2021 at 8:41 PM jie mei wrote: > Hi, Community > > I run a jmh benchmark task get blew error, which use flink sql consuming > data from data-gen connector(10_000_000) and write data to clickhouse. ble

Re: With the checkpoint interval of the same size, the Flink 1.12 version of the job checkpoint time-consuming increase and production failure, the Flink1.9 job is running normally

2021-03-24 Thread Congxian Qiu
Hi From the description, the time used to complete the checkpoint in 1.12 is longer. could you share more detail about the time consumption when running job on 1.9 and 1.12? Best, Congxian Haihang Jing 于2021年3月23日周二 下午7:22写道: > 【Appearance】For jobs with the same configuration (checkpoint in

Re: Pyflink tutorial output

2021-03-24 Thread Robert Cullen
Ah, there they are. Thanks! On Tue, Mar 23, 2021 at 10:26 PM Dian Fu wrote: > How did you check the output when submitting to the kubernetes session > cluster? I ask this because the output should be written to the local > directory “/tmp/output” on the TaskManagers where the jobs are running o

Re: Fault Tolerance with RocksDBStateBackend

2021-03-24 Thread Maminspapin
Ok, thank you, Guowei Ma -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Do data streams partitioned by same keyBy but from different operators converge to the same sub-task?

2021-03-24 Thread vishalovercome
Let me make the example more concrete. Say O1 gets as input a data stream T1 which it splits into two using some function and produces DataStreams of type T2 and T3, each of which are partitioned by the same key function TK. Now after O2 processes a stream, it could sometimes send the stream to O3

Fault Tolerance with RocksDBStateBackend

2021-03-24 Thread Maminspapin
Hi everyone, I want to build a flink cluster with 3 machines. What if I choose RocksDBStateBackend with next settings: #== # Fault tolerance and checkpointing #=

Re: Do data streams partitioned by same keyBy but from different operators converge to the same sub-task?

2021-03-24 Thread David Anderson
For an example of a similar join implemented as a RichCoFlatMap, see [1]. For more background, the Flink docs have a tutorial [2] on how to work with connected streams. [1] https://github.com/apache/flink-training/tree/master/rides-and-fares [2] https://ci.apache.org/projects/flink/flink-docs-stab

Re: DataDog and Flink

2021-03-24 Thread Arvid Heise
Hi Vishal, REST API is the most direct way to get through all metrics as Matthias pointed out. Additionally, you could also add a JMX reporter and log to the machines to check. But in general, I think you are on the right track. You need to reduce the metrics that are sent to DD by configuring th

Re: DataDog and Flink

2021-03-24 Thread Matthias Pohl
Hi Vishal, what about the TM metrics' REST endpoint [1]. Is this something you could use to get all the metrics for a specific TaskManager? Or are you looking for something else? Best, Matthias [1] https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/rest_api.html#taskmanagers-metrics

Re: Do data streams partitioned by same keyBy but from different operators converge to the same sub-task?

2021-03-24 Thread Matthias Pohl
1. yes - the same key would affect the same state variable 2. you need a join to have the same operator process both streams Matthias On Wed, Mar 24, 2021 at 7:29 AM vishalovercome wrote: > Let me make the example more concrete. Say O1 gets as input a data stream > T1 > which it splits into two

Re: Flink Streaming Counter

2021-03-24 Thread Matthias Pohl
Hi Vijayendra, what about the example from the docs you already referred to [1]? Best, Matthias [1] https://ci.apache.org/projects/flink/flink-docs-stable/ops/metrics.html#counter On Tue, Mar 23, 2021 at 6:48 PM Vijayendra Yadav wrote: > Hi Pohl, > > Thanks for getting back to me so quickly. I

Re:Re: About Memory Spilling to Disk in Flink

2021-03-24 Thread Roc Marshal
Hi, Guowei Ma. As far as I know, flink writes some in-memory data to disk when memory is running low. I noticed that flink uses ExternalSorterBuilder for batch operations in the org.apache.flink.runtime.operator.sort package, but I'm curious to confirm if this technique is also used in strea

Re: About Memory Spilling to Disk in Flink

2021-03-24 Thread Guowei Ma
Hi, Roc Could you explain more about your question? Best, Guowei On Wed, Mar 24, 2021 at 2:47 PM Roc Marshal wrote: > Hi, > > Can someone tell me where flink uses memory spilling to write to disk? > Thank you. > > Best, Roc. > > > >