k8s job cluster using StatefulSet

2020-08-10 Thread Alexey Trenikhun
Hello, Flink documentation suggests to use Deployments to deploy JM and TM for kubernetes job cluster. Is any known potential issues with using StatefulSets instead, seems StatefullSet provides uniqueness for JM during upgrade/rollback, while with Deployments could be multiple JM pods (e.g.1 ter

JM & TM readiness probe

2020-08-11 Thread Alexey Trenikhun
Hello, How can I define rediness fork8s job cluster deployments? I think for job manager, I can use REST API and check job status, but what about task manager? Is anyway to ask task manager Pod is it ready or not? Thanks, Alexey

Re: k8s job cluster using StatefulSet

2020-08-14 Thread Alexey Trenikhun
Thank you Arvid and Yang! From: Yang Wang Sent: Thursday, August 13, 2020 8:09:13 PM To: Arvid Heise Cc: Alexey Trenikhun ; user Subject: Re: k8s job cluster using StatefulSet Hi Alexey, Actually, StatefulSets could also be used to start the JobManager and

Flink Job cluster in HA mode - recovery vs upgrade

2020-08-19 Thread Alexey Trenikhun
Hello, Let's say I run Flink Job cluster with persistent storage and Zookeeper HA on k8s with single JobManager and use externalized checkpoints. When JM crashes, k8s will restart JM pod, and JM will read JobId and JobGraph from ZK and restore from latest checkpoint. Now let's say I want to up

Re: Flink Job cluster in HA mode - recovery vs upgrade

2020-08-21 Thread Alexey Trenikhun
: Piotr Nowojski Sent: Thursday, August 20, 2020 7:04 AM To: Chesnay Schepler Cc: Alexey Trenikhun ; Flink User Mail List Subject: Re: Flink Job cluster in HA mode - recovery vs upgrade Thank you for the clarification Chesney and sorry for the incorrect previous answer. Piotrek czw., 20 sie

Re: Flink Job cluster in HA mode - recovery vs upgrade

2020-08-21 Thread Alexey Trenikhun
2:16 PM To: Alexey Trenikhun ; Piotr Nowojski Cc: Flink User Mail List Subject: Re: Flink Job cluster in HA mode - recovery vs upgrade The HaServices are only being given the JobGraph, to this is not possible. Actually I have to correct myself. For a job cluster the state in HA should be irrelevant w

Re: Flink Job cluster in HA mode - recovery vs upgrade

2020-08-22 Thread Alexey Trenikhun
? Sounds like currently HA with kubernetes is not achievable unless some controller is used to manage JobManager. Am I right? From: Chesnay Schepler Sent: Saturday, August 22, 2020 12:58 AM To: Alexey Trenikhun ; Piotr Nowojski Cc: Flink User Mail List Subject: Re

Re: Flink Job cluster in HA mode - recovery vs upgrade

2020-08-23 Thread Alexey Trenikhun
from sp1 rather than from latest externalized checkpoint. Is my understanding correct? From: Chesnay Schepler Sent: Sunday, August 23, 2020 1:46:45 AM To: Alexey Trenikhun ; Piotr Nowojski Cc: Flink User Mail List Subject: Re: Flink Job cluster in HA mode

Re: Flink Job cluster in HA mode - recovery vs upgrade

2020-08-23 Thread Alexey Trenikhun
: Chesnay Schepler Sent: Sunday, August 23, 2020 7:25:11 AM To: Alexey Trenikhun ; Piotr Nowojski Cc: Flink User Mail List Subject: Re: Flink Job cluster in HA mode - recovery vs upgrade If HA is enabled the the cluster will continue from the latest externalized checkpoint. Without HA it still start

Re: Setting job/task manager memory management in kubernetes

2020-08-26 Thread Alexey Trenikhun
Hello, What version of Flink do you use? If you use 1.10+ please check [1] (different properties names) [1] - https://ci.apache.org/projects/flink/flink-docs-stable/ops/memory/mem_setup.html Thanks, Alexey From: Sakshi Bansal Sent: Monday, August 24, 2020 3:30

FileSystemHaServices and BlobStore

2020-08-28 Thread Alexey Trenikhun
Hello, I'm thinking about implementing FileSystemHaServices - single leader, but persistent RunningJobRegistry, CheckpointIDCounter, CompletedCheckpointStore and JobGraphStore. I'm not sure do you need FileSystemBlobStore or VoidBlobStore is enough. Can't figure out, should BlobStore survive Job

Re: FileSystemHaServices and BlobStore

2020-08-28 Thread Alexey Trenikhun
, Alexey [1]. https://issues.apache.org/jira/browse/FLINK-17598 From: Khachatryan Roman Sent: Friday, August 28, 2020 9:24 AM To: Alexey Trenikhun Cc: Flink User Mail List Subject: Re: FileSystemHaServices and BlobStore Hello Alexey, I think you need

Re: FileSystemHaServices and BlobStore

2020-08-29 Thread Alexey Trenikhun
Did test with streaming job and FileSystemHaService using VoidBlobStore (no HA Blob), looks like job was able to recover from both JM restart and TM restart. Any idea in what use cases HA Blob is needed? Thanks, Alexey From: Alexey Trenikhun Sent: Friday

Re: FileSystemHaServices and BlobStore

2020-09-01 Thread Alexey Trenikhun
from failure even VoidBlobStore. I also use StatefulSet instead of Deployment Thanks, Alexey From: Yang Wang Sent: Tuesday, September 1, 2020 1:58 AM To: dev Cc: Alexey Trenikhun ; Flink User Mail List Subject: Re: FileSystemHaServices and BlobStore Hi Alexey, Gl

Re: FileSystemHaServices and BlobStore

2020-09-03 Thread Alexey Trenikhun
Hi Yang, Yes, I’ve persisted CompletedCheckpointStore, CheckpointIDCounter and RunningJobsRegistry Thanks, Alexey From: Yang Wang Sent: Wednesday, September 2, 2020 8:21:20 PM To: Alexey Trenikhun Cc: dev ; Flink User Mail List Subject: Re

Unit Test for KeyedProcessFunction with out-of-core state

2020-09-03 Thread Alexey Trenikhun
Hello, I want to unit test KeyedProcessFunction which uses with out-of-core state (like rocksdb). Does Flink has mock for rocksdb, which can be used in unit tests ? Thanks, Alexey

Re: Unit Test for KeyedProcessFunction with out-of-core state

2020-09-04 Thread Alexey Trenikhun
, September 4, 2020 12:35:48 AM To: Alexey Trenikhun Cc: Flink User Mail List Subject: Re: Unit Test for KeyedProcessFunction with out-of-core state Hi Alexey, Is there a specific reason why you want to test against RocksDB? Otherwise, in Flink tests we use a `KeyedOneInputStreamOperatorTestHarness

Re: [DISCUSS] Deprecate and remove UnionList OperatorState

2020-09-12 Thread Alexey Trenikhun
-1 We use union state to generate sequences, each operator generates offset0 + number-of-tasks - task-index + task-specific-counter * number-of-tasks (e.g. for 2 instances of operator -one instance produce even number, another odd). Last generated sequence number is stored union list state, on

Re: Unit Test for KeyedProcessFunction with out-of-core state

2020-09-14 Thread Alexey Trenikhun
Thank you for ideas. Do you suggest to use new backend with unit test or integration test? Thanks, Alexey From: Arvid Heise Sent: Monday, September 14, 2020 4:26:47 AM To: Dawid Wysakowicz Cc: Alexey Trenikhun ; Flink User Mail List Subject: Re: Unit Test for

Re: FileSystemHaServices and BlobStore

2020-09-17 Thread Alexey Trenikhun
contributing to FileSystemHAService, maybe it will be useful to someone else until FLIP-144 will be available Thanks, Alexey From: Yang Wang Sent: Wednesday, September 16, 2020 8:25 PM To: Alexey Trenikhun Cc: dev; Flink User Mail List Subject: Re: FileSystemHaServices and

Re: Unit Test for KeyedProcessFunction with out-of-core state

2020-09-17 Thread Alexey Trenikhun
eway/local/LocalExecutorITCase.java#L132 [2] https://github.com/apache/flink/blob/5acbfedf754fa4d063931ea30432716374c2f8cf/flink-tests/src/test/java/org/apache/flink/test/state/StatefulOperatorChainedTaskTest.java#L143 On Tue, Sep 15, 2020 at 4:18 AM Alexey Trenikhun mailto:yen...@msn.com>>

Re: Ignoring invalid values in KafkaSerializationSchema

2020-09-24 Thread Alexey Trenikhun
You could forward tombstone records to “graveyard” topic, this way they will not confuse anyone reading from regular topic From: Yuval Itzchakov Sent: Thursday, September 24, 2020 11:50:28 AM To: Arvid Heise Cc: Matthias Pohl ; user Subject: Re: Ignoring inval

union stream vs multiple operators

2020-11-04 Thread Alexey Trenikhun
Hello, I have two Kafka topics ("A" and "B") which provide similar structure wise data but with different load pattern, for example hundreds records per second in first topic while 10 records per second in second topic. Events processed using same algorithm and output in common sink, currently

Re: union stream vs multiple operators

2020-11-06 Thread Alexey Trenikhun
Ok, thank you. From: Chesnay Schepler Sent: Thursday, November 5, 2020 3:15:28 PM To: Alexey Trenikhun ; Flink User Mail List Subject: Re: union stream vs multiple operators I don't think the first option has any benefit. On 11/5/2020 1:19 AM, A

prometheus variable value is "normalized"

2020-11-25 Thread Alexey Trenikhun
Hello, I'm trying to expose version string as Prometheus user variable: runtimeContext.getMetricGroup() .addGroup("version", "0.0.3-beta.399+gf5b79ac") .gauge("buildInfo", () -> 0); the problem that for some reason label value 0.0.3-beta.399+gf5b79ac is converted into 0_0_3_beta_

Re: prometheus variable value is "normalized"

2020-11-26 Thread Alexey Trenikhun
Thank you! From: Chesnay Schepler Sent: Wednesday, November 25, 2020 2:59:54 PM To: Alexey Trenikhun ; Flink User Mail List Subject: Re: prometheus variable value is "normalized" Essentially, we started with this behavior, and kept it to not brea

numRecordsOutPerSecond metric and side outputs

2020-12-21 Thread Alexey Trenikhun
Hello, Does numRecordsOutPerSecond metric takes into account number of records send to side output or it provides rate only for main output? Thanks, Alexey

Re: FlinkKafkaProducer Fails with "Topic not present in metadata"

2020-12-22 Thread Alexey Trenikhun
How to enable trace log for Kafka connector? I’m experience periodically same error, but in my case I don’t specify partition Alexey From: Joseph Lorenzini Sent: Thursday, December 10, 2020 7:36:54 AM To: Becket Qin Cc: user@flink.apache.org Subject: Re: Flink

state reset(lost) on TM recovery

2021-01-10 Thread Alexey Trenikhun
Hello, I'm using Flink 1.11.3, state backend is rocksdn. I have streaming job which reads from Kafka, transforms data and output into Kafka, one of processing nodes is KeyedCoProcessFunction with ValueState: 1. generated some input data, I see in log that state.update() is called and subseq

Re: state reset(lost) on TM recovery

2021-01-11 Thread Alexey Trenikhun
alizedSize]; dataInputView.read(data); return parser.parseFrom(CodedInputStream.newInstance(data)); } From: Chesnay Schepler Sent: Monday, January 11, 2021 4:36 PM To: Alexey Trenikhun ; Flink User Mail List Subject: Re: state reset(lost) on TM recovery Just do do

Re: state reset(lost) on TM recovery

2021-01-13 Thread Alexey Trenikhun
serialize key, so it is not effected by weirdness of protobuf hashCode, but what about filesystem backend? [1] - https://groups.google.com/g/protobuf/c/MCk1moyWgIk From: Chesnay Schepler Sent: Tuesday, January 12, 2021 2:20 AM To: Alexey Trenikhun ; Flink User

Re: state reset(lost) on TM recovery

2021-01-13 Thread Alexey Trenikhun
Ok, thanks. From: Chesnay Schepler Sent: Wednesday, January 13, 2021 11:46:15 AM To: Alexey Trenikhun ; Flink User Mail List Subject: Re: state reset(lost) on TM recovery The FsStateBackend makes heavy use of hashcodes, so it must be stable. On 1/13/2021 7:13

Flink Application cluster/standalone job: some JVM Options added to Program Arguments

2021-01-15 Thread Alexey Trenikhun
Hello, I was trying to deploy Flink 1.12.0 Application cluster on k8s, I have following job manager arguments: standalone-job --job-classname com.x.App --job-id @/opt/flink/conf/fsp.conf However, when I print args from App.main(): [@/opt/flink/conf/ssp.conf, -D,

Re: Flink Application cluster/standalone job: some JVM Options added to Program Arguments

2021-01-18 Thread Alexey Trenikhun
JOB args? Thanks, Alexey From: Matthias Pohl Sent: Sunday, January 17, 2021 11:53:29 PM To: Alexey Trenikhun Cc: Flink User Mail List Subject: Re: Flink Application cluster/standalone job: some JVM Options added to Program Arguments Hi Alexey, thanks for

flink-python_2.12-1.12.0.jar

2021-01-18 Thread Alexey Trenikhun
Hello, Is flink-python_2.12-1.12.0.jar in docker image needed to run Java based stream processor? Will Flink work if we will remove this jar from docker image and run Java job? Thanks, Alexey

Re: flink-python_2.12-1.12.0.jar

2021-01-18 Thread Alexey Trenikhun
Thank you! From: Chesnay Schepler Sent: Monday, January 18, 2021 3:14 PM To: Alexey Trenikhun ; Flink User Mail List Subject: Re: flink-python_2.12-1.12.0.jar The flink-python jar is only required for running python jobs. If you don't use such jobs yo

Flink 1.11.1 - job manager exists with exit code 0

2020-07-24 Thread Alexey Trenikhun
Hello, I've Flink 1.11.1 session cluster running via docker compose, I upload job jar, when I submit job jobmanager exits without any errors in log: ... {"@timestamp":"2020-07-25T04:32:54.007Z","@version":"1","message":"Starting execution of job katana-fsp (64ff3943fdc5024c5beef1612518c627) und

Re: Flink 1.11.1 - job manager exists with exit code 0

2020-07-28 Thread Alexey Trenikhun
and sorry for unnecessary noise Alexey From: Robert Metzger Sent: Tuesday, July 28, 2020 10:38:42 PM To: Alexey Trenikhun Cc: Flink User Mail List Subject: Re: Flink 1.11.1 - job manager exists with exit code 0 Hey Alexey, What is the exit code of the JobManager

Publish heartbeat messages in all Kafka partitions

2021-01-28 Thread Alexey Trenikhun
Hello, We need to publish heartbeat messages in all topic partitions. Is possible to produce single message and then somehow broadcast it to all partitions from FlinkKafkaProducer? Or only way that message source knows number of existing partitions and sends 1 message to 1 partition? Thanks, Al

stop job with Savepoint

2021-02-20 Thread Alexey Trenikhun
Hello, I'm running per job Flink cluster, JM is deployed as Kubernetes Job with restartPolicy: Never, highavailability is KubernetesHaServicesFactory. Job runs fine for some time, configmaps are created etc. Now in order to upgrade Flink job, I'm trying to stop job with savepoint (flink stop $J

Re: stop job with Savepoint

2021-02-20 Thread Alexey Trenikhun
Adding "list" to verbs helps, do I need to add anything else ? ____ From: Alexey Trenikhun Sent: Saturday, February 20, 2021 2:10 PM To: Flink User Mail List Subject: stop job with Savepoint Hello, I'm running per job Flink cluster, JM is deployed as

Kubernetes HA - attempting to restore from wrong (non-existing) savepoint

2021-02-26 Thread Alexey Trenikhun
Hello, We have Flink job running in Kubernetes with Kuberenetes HA enabled (JM is deployed as Job, single TM as StatefulSet). We taken savepoint with cancel=true. Now when we are trying to start job using --fromSavepoint A, where is A path we got from taking savepoint (ClusterEntrypoint reports

Re: Producer Configuration

2021-02-26 Thread Alexey Trenikhun
I believe bootstrap.servers is mandatory Kafka property, but it looks like you didn’t set it From: Claude M Sent: Friday, February 26, 2021 12:02:10 PM To: user Subject: Producer Configuration Hello, I created a simple Producer and when the job ran, it was get

Job downgrade

2021-02-27 Thread Alexey Trenikhun
Hello, Let's have version 1 of my job uses keyed state with name "a" and type A, which some Avro generated class. Then I upgrade to version 2, which in addition uses keyed state "b" and type B (another concrete Avro generated class), I take savepoint with version 2 and decided to downgrade to ve

Re: Producer Configuration

2021-02-27 Thread Alexey Trenikhun
Can you produce messages using Kafka console producer connect using same properties ? From: Claude M Sent: Saturday, February 27, 2021 8:05 AM To: Alexey Trenikhun Cc: user Subject: Re: Producer Configuration Thanks for your reply, yes it was specified

Re: Producer Configuration

2021-02-27 Thread Alexey Trenikhun
, 2021 4:00:23 PM To: Alexey Trenikhun Cc: user Subject: Re: Producer Configuration Yes, the flink job also works in producing messages. It's just that after a short period of time, it fails w/ a timeout. That is why I'm trying to set a longer timeout period but it doesn't seem like

Re: Kubernetes HA - attempting to restore from wrong (non-existing) savepoint

2021-03-01 Thread Alexey Trenikhun
From: Yang Wang Sent: Sunday, February 28, 2021 10:04 PM To: Alexey Trenikhun Cc: Flink User Mail List Subject: Re: Kubernetes HA - attempting to restore from wrong (non-existing) savepoint Hi Alexey, It seems that the KubernetesHAService works well since all the checkpoints have been cleaned u

Re: Job downgrade

2021-03-03 Thread Alexey Trenikhun
expected “rollback” to version 1+. From: Piotr Nowojski Sent: Wednesday, March 3, 2021 11:47:45 AM To: Alexey Trenikhun Cc: Flink User Mail List Subject: Re: Job downgrade Hi, I'm not sure what's the reason behind this. Probably classes are somehow a

Re: Job downgrade

2021-03-04 Thread Alexey Trenikhun
Hi Gordon, I was using RocksDB backend Alexey From: Tzu-Li (Gordon) Tai Sent: Thursday, March 4, 2021 12:58:01 AM To: Alexey Trenikhun Cc: Piotr Nowojski ; Flink User Mail List Subject: Re: Job downgrade Hi Alexey, Are you using the heap backend? If that&#

Re: Trigger and completed Checkpointing do not appeared

2021-03-08 Thread Alexey Trenikhun
The picture in first e-mail shows that job was completed in 93ms From: Abdullah bin Omar Sent: Monday, March 8, 2021 3:53 PM To: user@flink.apache.org Subject: Re: Trigger and completed Checkpointing do not appeared Hi, Please read the previous email (and also

Re: Re: How to check checkpointing mode

2021-03-09 Thread Alexey Trenikhun
u for your help Alexey From: Yun Gao Sent: Monday, March 8, 2021 7:14 PM To: Alexey Trenikhun ; Flink User Mail List Subject: Re: Re: How to check checkpointing mode Hi Alexey, Sorry I also do not see problems in the attached code. Could you add a breakpoint

Re: Kubernetes HA - attempting to restore from wrong (non-existing) savepoint

2021-03-09 Thread Alexey Trenikhun
Hi Yang, The problem is re-occurred, full JM log is attached Thanks, Alexey From: Yang Wang Sent: Sunday, February 28, 2021 10:04 PM To: Alexey Trenikhun Cc: Flink User Mail List Subject: Re: Kubernetes HA - attempting to restore from wrong (non-existing

User metrics outside tasks

2021-03-10 Thread Alexey Trenikhun
Hello, Is it possible to register user metric outside task/operator (not in RichMapFunction#open) Thanks, Alexey

EOFException on attempt to scale up job with RocksDB state backend

2021-03-10 Thread Alexey Trenikhun
Hello, I was trying to scale job up, took save point, changed parallelism setting from 6 to 8 and started job from savepoint: switched from RUNNING to FAILED on 10.204.2.98:6122-2946e1 @ gsp-tm-0.gsp-headless.gsp.svc.cluster.local (dataPort=41409). java.lang.Exception: Exception while creating S

Re: User metrics outside tasks

2021-03-11 Thread Alexey Trenikhun
-OOM counter From: Arvid Heise Sent: Thursday, March 11, 2021 5:22:30 AM To: Alexey Trenikhun Cc: Flink User Mail List ; Chesnay Schepler Subject: Re: User metrics outside tasks Hi Alexey, could you describe what you want to achieve? Most metrics are bound to a

Re: Kubernetes HA - attempting to restore from wrong (non-existing) savepoint

2021-03-11 Thread Alexey Trenikhun
deleting Kubernetes Job and HA configmaps enough, or something in persisted storage should be deleted as well? Thanks, Alexey From: Yang Wang Sent: Thursday, March 11, 2021 2:59 AM To: Alexey Trenikhun Cc: Flink User Mail List Subject: Re: Kubernetes HA

Re: Kubernetes HA - attempting to restore from wrong (non-existing) savepoint

2021-03-14 Thread Alexey Trenikhun
From: Yang Wang Sent: Sunday, March 14, 2021 7:50:21 PM To: Alexey Trenikhun Cc: Flink User Mail List Subject: Re: Kubernetes HA - attempting to restore from wrong (non-existing) savepoint If the HA related ConfigMaps still exists, then I am afraid the data located on the distributed storage

Re: EOFException on attempt to scale up job with RocksDB state backend

2021-03-15 Thread Alexey Trenikhun
Savepoint was taken with 1.12.1, I've tried to scale up using same version and 1.12.2 From: Tzu-Li (Gordon) Tai Sent: Monday, March 15, 2021 12:06 AM To: user@flink.apache.org Subject: Re: EOFException on attempt to scale up job with RocksDB state backend Hi,

Re: Checkpoint fail due to timeout

2021-03-15 Thread Alexey Trenikhun
k or per TM? I see multiple threads in SynchronizedStreamTaskActionExecutor.runThrowing blocked on different Objects. Thanks, Alexey From: Roman Khachatryan Sent: Monday, March 15, 2021 2:16 AM To: Alexey Trenikhun Cc: Flink User Mail List Subject: Re: Checkpoint fail due to timeout Hello Alexey,

Re: EOFException on attempt to scale up job with RocksDB state backend

2021-03-15 Thread Alexey Trenikhun
No, I believe original exception was from 1.12.1 to 1.12.1 Thanks, Alexey From: Yun Tang Sent: Monday, March 15, 2021 8:07:07 PM To: Alexey Trenikhun ; Tzu-Li (Gordon) Tai ; user@flink.apache.org Subject: Re: EOFException on attempt to scale up job with

Re: EOFException on attempt to scale up job with RocksDB state backend

2021-03-16 Thread Alexey Trenikhun
Also restore from same savepoint without change in parallelism works fine. From: Alexey Trenikhun Sent: Monday, March 15, 2021 9:51 PM To: Yun Tang ; Tzu-Li (Gordon) Tai ; user@flink.apache.org Subject: Re: EOFException on attempt to scale up job with RocksDB

Re: Checkpoint fail due to timeout

2021-03-16 Thread Alexey Trenikhun
From: ChangZhuo Chen (陳昌倬) Sent: Tuesday, March 16, 2021 6:59 AM To: Alexey Trenikhun Cc: ro...@apache.org; Flink User Mail List Subject: Re: Checkpoint fail due to timeout On Tue, Mar 16, 2021 at 02:32:54AM +0000, Alexey Trenikhun wrote: > Hi Roman, > I took thread dump: > "Source:

Re: Checkpoint fail due to timeout

2021-03-17 Thread Alexey Trenikhun
("hdfs:///checkpoints-data/")); Difference to Savepoints ci.apache.org From: ChangZhuo Chen (陳昌倬) Sent: Wednesday, March 17, 2021 12:29 AM To: Alexey Trenikhun Cc: ro...@apache.org; Flink User Mail List Subject: Re: Checkpoint fail due to timeout On Wed, Ma

Re: EOFException on attempt to scale up job with RocksDB state backend

2021-03-17 Thread Alexey Trenikhun
, not sure does it have effect on savepoint or not) Thanks, Alexey From: Yun Tang Sent: Wednesday, March 17, 2021 12:33 AM To: Alexey Trenikhun ; Tzu-Li (Gordon) Tai ; user@flink.apache.org Subject: Re: EOFException on attempt to scale up job with RocksDB state b

Re: EOFException on attempt to scale up job with RocksDB state backend

2021-03-17 Thread Alexey Trenikhun
ed in [1] or without compaction? Thanks, Alexey From: Yun Tang Sent: Wednesday, March 17, 2021 9:31 PM To: Alexey Trenikhun ; Tzu-Li (Gordon) Tai ; user@flink.apache.org Subject: Re: EOFException on attempt to scale up job with RocksDB state backend Hi Alexey,

inputFloatingBuffersUsage=0?

2021-03-18 Thread Alexey Trenikhun
Hello, While trying to investigate back pressure using hints from [1], I've noticed that flink_taskmanager_job_task_buffers_inPoolUsage and flink_taskmanager_job_task_buffers_inputFloatingBuffersUsage are always 0, which looks suspicious, are these metrics still populated ? Thanks, Alexey [1]

Re: EOFException on attempt to scale up job with RocksDB state backend

2021-03-18 Thread Alexey Trenikhun
March 18, 2021 5:08 AM To: Alexey Trenikhun ; Tzu-Li (Gordon) Tai ; user@flink.apache.org Subject: Re: EOFException on attempt to scale up job with RocksDB state backend Hi Alexey, Flink would only write once for checkpointed files. Could you try to write checkpointed files as block blob format

Re: inputFloatingBuffersUsage=0?

2021-03-19 Thread Alexey Trenikhun
, 2021 5:01 AM To: Alexey Trenikhun Cc: Flink User Mail List Subject: Re: inputFloatingBuffersUsage=0? Hi, clarification about the 2nd part. Required memory is one single exclusive buffer per channel, so if you are running low on memory, floating buffers are one of the first to go, hence you could

Histogram

2021-03-19 Thread Alexey Trenikhun
Hello, Is any way to expose from Flink Histogram metric, not Summary ? Thanks, Alexey

Re: Checkpoint fail due to timeout

2021-03-22 Thread Alexey Trenikhun
checkpoint still times out after 3hr. From: Arvid Heise Sent: Monday, March 22, 2021 6:58:20 AM To: ChangZhuo Chen (陳昌倬) Cc: Alexey Trenikhun ; ro...@apache.org ; Flink User Mail List Subject: Re: Checkpoint fail due to timeout Hi Alexey, rescaling from

Re: Checkpoint fail due to timeout

2021-03-22 Thread Alexey Trenikhun
: ChangZhuo Chen (陳昌倬) Cc: Alexey Trenikhun ; Flink User Mail List Subject: Re: Checkpoint fail due to timeout Thanks for sharing the thread dump. It shows that the source thread is indeed back-pressured (checkpoint lock is held by a thread which is trying to emit but unable to acquire any free buffers

Re: Checkpoint fail due to timeout

2021-03-22 Thread Alexey Trenikhun
hread.run(SourceStreamTask.java:263) Thanks, Alexey From: Roman Khachatryan Sent: Monday, March 22, 2021 1:36 AM To: ChangZhuo Chen (陳昌倬) Cc: Alexey Trenikhun ; Flink User Mail List Subject: Re: Checkpoint fail due to timeout Thanks for sharing the thread dump. It

Re: Kubernetes HA - attempting to restore from wrong (non-existing) savepoint

2021-03-24 Thread Alexey Trenikhun
From: Yang Wang Sent: Tuesday, March 23, 2021 11:17:18 PM To: Alexey Trenikhun Cc: Flink User Mail List Subject: Re: Kubernetes HA - attempting to restore from wrong (non-existing) savepoint Hi Alexey, >From your attached logs, I do not think the new start JobManager will recover &g

Re: EOFException on attempt to scale up job with RocksDB state backend

2021-03-24 Thread Alexey Trenikhun
Hi Yun, Finally I was able to try to rescale with block blobs configured - rescaled from 6 to 8 w/o problem. So loos like indeed there is problem with page blob. Thank you for help, Alexey From: Alexey Trenikhun Sent: Thursday, March 18, 2021 11:31 PM To: Yun

Re: Checkpoint fail due to timeout

2021-03-30 Thread Alexey Trenikhun
uring next performance run. Thanks, Alexey From: Roman Khachatryan Sent: Tuesday, March 23, 2021 12:17 AM To: Alexey Trenikhun Cc: ChangZhuo Chen (陳昌倬) ; Flink User Mail List Subject: Re: Checkpoint fail due to timeout Unfortunately, the lock can't be chang

Re: Checkpoint fail due to timeout

2021-03-30 Thread Alexey Trenikhun
wojski Sent: Tuesday, March 23, 2021 5:31 AM To: Alexey Trenikhun Cc: Arvid Heise ; ChangZhuo Chen (陳昌倬) ; ro...@apache.org ; Flink User Mail List Subject: Re: Checkpoint fail due to timeout Hi Alexey, You should definitely investigate why the job is stuck. 1. First of all, is it completely s

idleTimeMsPerSecond exceeds 1000

2021-04-20 Thread Alexey Trenikhun
Hello, When Flink job mostly idle, idleTimeMsPerSecond for given task_name and subtask_index sometimes exceeds 1000, I saw values up to 1350, but usually not higher than 1020. Is it due to accuracy of nanoTime/currentTimeMillis or there is bug in calculations ? Thanks, Alexey

Re: idleTimeMsPerSecond exceeds 1000

2021-04-21 Thread Alexey Trenikhun
Great. Thank ypu From: Chesnay Schepler Sent: Wednesday, April 21, 2021 1:02:59 AM To: Alexey Trenikhun ; Flink User Mail List Subject: Re: idleTimeMsPerSecond exceeds 1000 This ticket seems related; the issue was fixed in 1.13: https://issues.apache.org/jira

Unexpected key-group in restore

2021-05-11 Thread Alexey Trenikhun
Hello, Suddenly our job (Flink 1.12.2) started to failing to restore from savepoint due to unexpected key group, it looks like stack is for memory state backend, while in fact actually in flink-conf.yaml state.backend: rocksdb {"ts":"2021-05-11T15:44:09.004Z","message":"Loading configuration prop

reactive mode and back pressure

2021-05-14 Thread Alexey Trenikhun
Hello, Is new reactive mode can operate under back pressure? Old manual rescaling via taking savepoint didn't work with system under back pressure, since it was practically impossible to take savepoint, so wondering is reactive mode expected to be better in this regards ? Thanks, Alexey

The heartbeat of JobManager timed out

2021-05-15 Thread Alexey Trenikhun
Hello, I periodically see in JM log (Flink 12.2): {"ts":"2021-05-15T21:10:36.325Z","message":"The heartbeat of JobManager with id be8225ebae1d6422b7f268c801044b05 timed out.","logger_name":"org.apache.flink.runtime.resourcemanager.StandaloneResourceManager","thread_name":"flink-akka.actor.defau

Re: reactive mode and back pressure

2021-05-16 Thread Alexey Trenikhun
modes. Thank you~ Xintong Song [1] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/state/checkpoints/#unaligned-checkpoints On Fri, May 14, 2021 at 11:29 PM Alexey Trenikhun mailto:yen...@msn.com>> wrote: Hello, Is new reactive mode can operate under back pr

Re: Helm chart for Flink

2021-05-17 Thread Alexey Trenikhun
I think it should be possible to use Helm pre-upgrade hook to take savepoint and stop currently running job and then Helm will upgrade image tags. The problem is that if you hit timeout while taking savepoint, it is not clear how to recover from this situation Alexey ___

KafkaSource

2021-05-17 Thread Alexey Trenikhun
Hello, Is new KafkaSource/KafkaSourceBuilder ready to be used ? If so, is KafkaSource state compatible with legacy FlinkKafkaConsumer, for example if I replace FlinkKafkaConsumer by KafkaSource, will offsets continue from what we had in FlinkKafkaConsumer ? Thanks, Alexey

Re: After upgrade from 1.11.2 to 1.13.0 parameter taskmanager.numberOfTaskSlots set to 1.

2021-05-18 Thread Alexey Trenikhun
If flink-conf.yaml is readonly, flink will complain but work fine? From: Chesnay Schepler Sent: Wednesday, May 12, 2021 5:38 AM To: Alex Drobinsky Cc: user@flink.apache.org Subject: Re: After upgrade from 1.11.2 to 1.13.0 parameter taskmanager.numberOfTaskSlots

Re: KafkaSource metrics

2021-05-25 Thread Alexey Trenikhun
Looks like when KafkaSource is used instead of FlinkKafkaConsumer, metrics listed below are not available. Bug? Work in progress? Thanks, Alexey From: Ardhani Narasimha Sent: Monday, May 24, 2021 9:08 AM To: 陳樺威 Cc: user Subject: Re: KafkaSource metrics Use b

Re: KafkaSource metrics

2021-05-26 Thread Alexey Trenikhun
Found https://issues.apache.org/jira/browse/FLINK-22766 From: Alexey Trenikhun Sent: Tuesday, May 25, 2021 3:25 PM To: Ardhani Narasimha ; 陳樺威 ; Flink User Mail List Subject: Re: KafkaSource metrics Looks like when KafkaSource is used instead of

How to check is Kafka partition "idle" in emitRecordsWithTimestamps

2021-05-28 Thread Alexey Trenikhun
Hello, I'm thinking about implementing custom Kafka connector which provides event alignment (similar to FLINK-10921, which seems abandoned). What is the way to determine is partition is idle from override of AbstractFetcher.emitRecordsWithTimestamps()? Does KafkaTopicPartitionState has this in

Re: How to check is Kafka partition "idle" in emitRecordsWithTimestamps

2021-06-01 Thread Alexey Trenikhun
eldName Comment rendererType atlassian-wiki-renderer issueKey FLINK-18450 Preview comment issues.apache.org Thanks, Alexey From: Till Rohrmann Sent: Tuesday, June 1, 2021 6:24 AM To: Alexey Trenikhun Cc: Flink User Mail List Subject: Re: How to check is Kafka part

Blocking KeyedCoProcessFunction.processElement1

2020-01-22 Thread Alexey Trenikhun
Hello, If KeyedCoProcessFunction.processElement1 blocks for significant amount of time, will it prevent checkpoint ? Thanks, Alexey

Re: Blocking KeyedCoProcessFunction.processElement1

2020-01-27 Thread Alexey Trenikhun
: Sunday, January 26, 2020 8:42:37 AM To: Alexey Trenikhun ; user@flink.apache.org Subject: Re: Blocking KeyedCoProcessFunction.processElement1 Hi Alexey Actually, I don't understand why you thing KeyedCoProcessFunction#processElement1 would block for significant amount of time, it just pr

async io parallelism

2020-02-21 Thread Alexey Trenikhun
Hello, Let's say, my elements are simple key-value pairs, elements are coming from Kafka, where they were partitioned by "key", then I do processing using KeyedProcessFunction (keyed by same "key"), then I enrich elements using ordered RichAsyncFunction, then output to another KeyedProcessFuncti

Re: async io parallelism

2020-02-24 Thread Alexey Trenikhun
Arvid, thank you. So there is single instance of FIFO per async IO operator regardless of parallelism of the async IO operator? Thanks, Alexey From: Arvid Heise Sent: Saturday, February 22, 2020 1:23:01 PM To: Alexey Trenikhun Cc: user@flink.apache.org Subject

Managed Keyed state update

2018-08-13 Thread Alexey Trenikhun
Let’s say I have Managed Keyed state - MapState> x, I initialize for state for “k0” - x.put(“k0”, new Tuple2<>(“a”, “b”)); Later I retried state Tuple2 v = x.get(“k0”); and change value: v.f0=“U”;, does it make state ‘dirty’? In other words, do I need to call x.put(“k0”, v) again or change will

Re: Managed Keyed state update

2018-08-13 Thread Alexey Trenikhun
Clear. Thank you Get Outlook for iOS<https://aka.ms/o0ukef> From: Renjie Liu Sent: Monday, August 13, 2018 4:33 PM To: Alexey Trenikhun Cc: user@flink.apache.org Subject: Re: Managed Keyed state update Hi, Alexey: It depends on the state backend you use.

ListState - elements order

2018-09-05 Thread Alexey Trenikhun
Hello, Does keyed managed ListState preserve elements order, for example if I call listState.add(e1); listState.add(e2); listState.add(e3); , does ListState guarantee that listState.get() will return elements in order they were added (e1, e2, e3) Alexey

Re: ListState - elements order

2018-09-14 Thread Alexey Trenikhun
se of merging windows we don't know the order of elements in a ListState after a merge. Best, Aljoscha On 6. Sep 2018, at 08:19, vino yang mailto:yanghua1...@gmail.com>> wrote: Hi Alexey, The answer is Yes, which preserves the semantics of the List's order of elements. Thank,

Event timers - metrics

2018-10-01 Thread Alexey Trenikhun
Hello, Are built-in timer metrics? For example number of registered timers, number of triggered timers etc Thanks, Alexey

Re: Event timers - metrics

2018-10-02 Thread Alexey Trenikhun
Thank you for looking Alexey Get Outlook for iOS<https://aka.ms/o0ukef> From: Andrey Zagrebin Sent: Tuesday, October 2, 2018 8:27 AM To: Alexey Trenikhun Cc: user@flink.apache.org Subject: Re: Event timers - metrics Hi Alexey, Looking into the source, I

Re: flink build error

2018-11-14 Thread Alexey Trenikhun
Hello, It sounds like surefire problem with latest Java: https://issues.apache.org/jira/browse/SUREFIRE-1588 Alexey From: Radu Tudoran Sent: Wednesday, November 14, 2018 6:47 AM To: user Subject: flink build error Hi, I am trying to build flink 1.6 but cannot b

  1   2   >