from:"Evgeniy Lyutikov"

Re: SSL Kafka PyFlink

2024-05-17 Thread Evgeniy Lyutikov via user

Hi Phil You need specify keystore with CA location [1] [1] https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/connectors/table/kafka/#security От: gongzhongqiang Отправлено: 17 мая 2024 г. 10:44:18 Кому: Phil Stavridis Копия: user@flink.apache.org

Why calling ListBucket for each file in a checkpoint

2024-01-18 Thread Evgeniy Lyutikov

Hi all! I'm trying to understand the logic of saving checkpoint files and from the exchange dump with ceph I see the following requests HEAD /checkpoints/example-job//shared/9701fae2-0de3-4d6c-b08b-0a92fb7285c9 HTTP/1.1 HTTP/1.1 404 Not Found HEAD /checkpoints/e

FlinkSQL environment values in ddl

2023-11-28 Thread Evgeniy Lyutikov

We started using several flinksql jobs in kubernetes cluster and would like to understand how to safely pass passwords and other sensitive data in the description of DDLs of tables. Is there any way to use pointers to environment variables? "This message contai

Re: Checkpoints are not triggering when S3 is unavailable

2023-11-07 Thread Evgeniy Lyutikov

Hi, thanks for the reply We use the kafka entry as a sink. It’s not clear why the flink stops triggering new checkpoints that would time out as expected. От: Hangxiang Yu Отправлено: 6 ноября 2023 г. 11:02:08 Кому: Evgeniy Lyutikov; user@flink.apache.org Тема

Re: flink-kubernetes-operator cannot handle SPECCHANGE for 100+ FlinkDeployments concurrently

2023-11-03 Thread Evgeniy Lyutikov

Hello! I constantly get a similar error when operator (working in single instance) receiving deployment statuses Details described in this message https://lists.apache.org/thread/0odcc9pvlpz1x9y2nop9dlmcnp9v1696 I tried changing versions and allocated resources, as well as the number of reconci

Checkpoints are not triggering when S3 is unavailable

2023-10-30 Thread Evgeniy Lyutikov

Hi team! I came across strange behavior in Flink 1.17.1. If during the build of a checkpoint the s3 storage becomes unavailable, then the current checkpoint expired by timeout and new ones are not triggered. The triggering for new checkpoints is resumed only after s3 is restored and this can be

Re: Flink kubernets operator delete HA metadata after resuming from suspend

2023-10-19 Thread Evgeniy Lyutikov

Hi. I patched my copy of the 1.6.0 operator with edits from https://github.com/apache/flink-kubernetes-operator/pull/673 This solved the problem От: Tony Chen Отправлено: 19 октября 2023 г. 4:18:36 Кому: Evgeniy Lyutikov Копия: user@flink.apache.org; Gyula

Re: FlinkML 'DenseVector' object has no attribute 'get_fields_by_names'

2023-09-19 Thread Evgeniy Lyutikov

Thanks for the answer, I'll try. Are there examples or tutorials somewhere on how to use FlinkML in real-life scenarios, such as streaming Kafka through a model? От: Xin Jiang Отправлено: 19 сентября 2023 г. 8:07:11 Кому: Evgeniy Lyutikov Копия:

FlinkML 'DenseVector' object has no attribute 'get_fields_by_names'

2023-09-18 Thread Evgeniy Lyutikov

Hello community! I'm trying to use FlinkML to train a model on data from a PostgreSQL table and I get an error when I try to view the output table after model AttributeError: 'DenseVector' object has no attribute 'get_fields_by_names' My code: # Create train source table t_env.execute_sql( "

Re: Flink kubernets operator delete HA metadata after resuming from suspend

2023-09-11 Thread Evgeniy Lyutikov

Why we need to use latest CRD version with older operator version? От: Gyula Fóra Отправлено: 12 сентября 2023 г. 0:36:26 Кому: Evgeniy Lyutikov Копия: user@flink.apache.org Тема: Re: Flink kubernets operator delete HA metadata after resuming from suspend Do

Re: Flink kubernets operator delete HA metadata after resuming from suspend

2023-09-11 Thread Evgeniy Lyutikov

Is it safe to rollback the operator version with replace to old CRDs? От: Evgeniy Lyutikov Отправлено: 11 сентября 2023 г. 23:50:26 Кому: Gyula Fóra Копия: user@flink.apache.org Тема: Re: Flink kubernets operator delete HA metadata after resuming from suspend

Re: Flink kubernets operator delete HA metadata after resuming from suspend

2023-09-11 Thread Evgeniy Lyutikov

Hi! No, no one could restart jobmanager, I monitored the pods in real time, they all deleted when suspended as expected. От: Gyula Fóra Отправлено: 11 сентября 2023 г. 20:34:52 Кому: Evgeniy Lyutikov Копия: user@flink.apache.org Тема: Re: Flink kubernets

Flink kubernets operator delete HA metadata after resuming from suspend

2023-09-11 Thread Evgeniy Lyutikov

Hi all! After updating the operator to version 1.6.0, suspended and resuming flink jobs stopped working. When job resumes, the high availability metadata is removed. Suspend job: 2023-09-11 06:01:41,548 o.a.f.k.o.l.AuditUtils [INFO ][rec-job/rec-job] >>> Event | Info| SPECCHANGED

Re: Kubernetes operator listing jobs TimeoutException

2023-09-06 Thread Evgeniy Lyutikov

/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) От: Evgeniy Lyutikov Отправлено: 8 июня 2023 г. 13:43:18 Кому: Shammon FY Копия: user@flink.apache.org Тема: Re: Kubernetes operator listing jobs

Re: Kubernetes operator listing jobs TimeoutException

2023-06-07 Thread Evgeniy Lyutikov

393756c0-8266-41a2-8d82-eb9ec46e90a3] От: Shammon FY Отправлено: 8 июня 2023 г. 12:55:38 Кому: Evgeniy Lyutikov Копия: user@flink.apache.org Тема: Re: Kubernetes operator listing jobs TimeoutException Hi Evgeniy, From the following exception message:

Kubernetes operator listing jobs TimeoutException

2023-06-07 Thread Evgeniy Lyutikov

Hello. We use Kubernetes operator 1.4.0, operator serves about 50 jobs, but sometimes there are errors in the logs that are reflected in the metrics (FlinkDeployment.JmDeploymentStatus.READY.Count). What is the reason for such errors? 2023-06-07 15:28:27,601 o.a.f.k.o.c.FlinkDeploymentControll

Kubernetes skip sidecar failure

2023-03-21 Thread Evgeniy Lyutikov

Hello everybody! We're using Flink 1.14 and kubernetes operator 1.2.0, the pod template configures the use of the haproxy sidecar container for load balancing on a persistence checkpoint in S3 storage. Sometimes this haproxy sidecar exits and flink completely restarts the taskmamager module and

Re: [EXT] Re: Kubernetes operator set container resources and limits

2023-03-13 Thread Evgeniy Lyutikov

c Отправлено: 13 марта 2023 г. 19:31:07 Кому: Andrew Otto; Evgeniy Lyutikov Копия: user@flink.apache.org Тема: Re: [EXT] Re: Kubernetes operator set container resources and limits Hi, You can set the following properties in the flinkConfiguration inside your .yaml file: * kubernet

Re: Kubernetes operator set container resources and limits

2023-03-13 Thread Evgeniy Lyutikov

I want to set different values for resources and limits. От: Andrew Otto Отправлено: 13 марта 2023 г. 17:50 Кому: Evgeniy Lyutikov Копия: user@flink.apache.org Тема: Re: Kubernetes operator set container resources and limits Hi, > return to the same val

Kubernetes operator set container resources and limits

2023-03-13 Thread Evgeniy Lyutikov

Hi all Is there any way to specify different values for resources and limits for a jobmanager container? The problem is that sometimes kubernetes kills the jobmanager container because it exceeds the memory consumption. Last State: Terminated Reason: OOMKilled Exit Code:137

Re: PyFlink job in kubernetes operator

2023-01-25 Thread Evgeniy Lyutikov

авлено: 25 января 2023 г. 21:03:40 Кому: Evgeniy Lyutikov Копия: user@flink.apache.org Тема: Re: PyFlink job in kubernetes operator Did you check the Python example? https://github.com/apache/flink-kubernetes-operator/tree/main/examples/flink-python-example<https://eur04.safelinks.protection.ou

PyFlink job in kubernetes operator

2023-01-25 Thread Evgeniy Lyutikov

Hello Is there a way to run PyFlink jobs in k8s with flink kubernetes operator? And if not, is it planned to add such functionality? "This message contains confidential information/commercial secret. If you are not the intended addressee of this message you may

Re: Parse checkpoint _metadata file

2022-12-22 Thread Evgeniy Lyutikov

date and size). About 90% of links lead to non-existent objects. От: Hangxiang Yu Отправлено: 22 декабря 2022 г. 14:21:46 Кому: Evgeniy Lyutikov; user@flink.apache.org Тема: Re: Parse checkpoint _metadata file Hi, > Is there some way to deserialize the checkpoin

Parse checkpoint _metadata file

2022-12-21 Thread Evgeniy Lyutikov

Hello All Is there some way to deserialize the checkpint _metadata file? I want to understand what the checkpoint saves and how the occupied space is distributed. If i try to process the file with regular expressions, then approximately 90% of S3 paths of objects are actually missing in the buc

Re: Several job in kubernetes restarts because Scheduler is being stopped.

2022-12-02 Thread Evgeniy Lyutikov

г. 14:27:25 Кому: Evgeniy Lyutikov Тема: Re: Several job in kubernetes restarts because Scheduler is being stopped. Hi Evgeniy, stopping the scheduler is not the cause of this restart. It's more of a symptom when suspending the job: The JobMaster is being stopped and as a consequence,

Several job in kubernetes restarts because Scheduler is being stopped.

2022-11-25 Thread Evgeniy Lyutikov

Hello! In our k8s application cluster (served by flink-operator) several jobs restart at the same time with the same error. What is the reason for this restart and how can it be prevented? 2022-11-25T07:50:47.253459360Z INFO org.apache.flink.runtime.resourcemanager.ResourceManagerServiceImpl

Re: How can I deploy a flink cluster with 4 TaskManagers?

2022-11-25 Thread Evgeniy Lyutikov

Hello Taskmanager count = job.parallelism / taskmanager.numberOfTaskSlots От: Mark Lee Отправлено: 25 ноября 2022 г. 18:30:31 Кому: user@flink.apache.org Тема: How can I deploy a flink cluster with 4 TaskManagers? Hi all, How can I deploy a flink cluster with 1

Re: Safe way to clear old checkpoint data

2022-11-25 Thread Evgeniy Lyutikov

_metadata? От: Martijn Visser Отправлено: 25 ноября 2022 г. 16:15:45 Кому: Evgeniy Lyutikov Копия: user Тема: Re: Safe way to clear old checkpoint data Hi, I would recommend upgrading to Flink 1.15, given the changes that were made in 1.15 make ownership more understandable.

Safe way to clear old checkpoint data

2022-11-25 Thread Evgeniy Lyutikov

Hello We use Flink 1.14.4 in kubernetes operator (version 1.2.0), all chepoint data store in s3 bucket. If parse _metadata file of checkpoint it contains links to objects in the shared directory and their number is much less than the total number of objects in the directory. For example, the n

Re: allowNonRestoredState doesn't seem to be working

2022-10-13 Thread Evgeniy Lyutikov

The problem is that changing the FlinkDeployment specification (new jar version, changing pod resources, etc.) for JobManager is just a restart. 2022-09-16 09:30:52,526 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] - Restoring job from Ch

Kubernetes operator assign Job ID

2022-10-12 Thread Evgeniy Lyutikov

Hi everyone After updating kuberneter operator to version 1.2.0 noticed that it started generating jobid for all deployments. 2022-10-13 06:18:30,724 o.a.f.k.o.c.FlinkDeploymentController [INFO ][infojob/infojob] Starting reconciliation 2022-10-13 06:18:30,725 o.a.f.k.o.l.AuditUtils [IN

Sometimes checkpoints to s3 fail

2022-10-06 Thread Evgeniy Lyutikov

Hello all. I can’t understand the floating problem, sometimes checkpoints stop passing, sometimes they start to complete every other time. Flink 1.14.4 in kubernetes application mode. 2022-10-06 09:08:04,731 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] - Triggering che

Re: JobManager restarts on job failure

2022-09-20 Thread Evgeniy Lyutikov

use application mode cluster). От: Gyula Fóra Отправлено: 20 сентября 2022 г. 19:49:37 Кому: Evgeniy Lyutikov Копия: user@flink.apache.org Тема: Re: JobManager restarts on job failure The best thing for you to do would be to upgrade to Flink 1.15 and the latest ope

JobManager restarts on job failure

2022-09-20 Thread Evgeniy Lyutikov

Hi, We using flink 1.14.4 with flink kubernetes operator. Sometimes when updating a job, it fails on startup and flink removes all HA metadata and exits the jobmanager. 2022-09-14 14:54:44,534 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] - Restoring job 00

Jobmanager sometimes exits at startup in kubernetes

2022-08-18 Thread Evgeniy Lyutikov

Hi all! I'm using Flink 1.14.4 along with kubernetes operator version 1.1.0, sometimes kubernetes operator restarts the cluster after changing the flinkdeployment object (with saving savepoint ), the new jobmanager which created exits right after start. 2022-08-18T06:47:52.627825838Z DEBUG o

Re: Savepoint problen on KubernetesOperator HA cluster

2022-08-11 Thread Evgeniy Lyutikov

a Fóra Отправлено: 11 августа 2022 г. 16:25:47 Кому: Evgeniy Lyutikov Копия: user@flink.apache.org Тема: Re: Savepoint problen on KubernetesOperator HA cluster In general the Flink JobManager HA /client mechanism ensures that the rest requests end up at the current leader. In your case it'

Savepoint problen on KubernetesOperator HA cluster

2022-08-11 Thread Evgeniy Lyutikov

Hi, I'm using flink 1.14.4 with flink kubernetes operator 1.0.1 with ha configuration on 3 jobmanager. When trying to change the job configuration, it restarts with trigger savepoint and an error occurs each time: 2022-08-10 12:04:21,142 mo.a.f.k.o.c.FlinkDeploymentController [INFO ][job-nam

37 matches

Mail list logo