Clarification on Flink Pipeline and Cluster Termination with Operator v1.9

2024-11-14 Thread Sigalit Eliazov
Hi, I am currently deploying a Flink pipeline using Flink Kubernetes Operator v1.9 alongside Flink version 1.19.1 as part of a comprehensive use case. When the use case is undeployed, I need to ensure that the Flink pipeline is properly canceled and the Flink cluster is taken down. My approach

flink pipeline canary upgrade question

2024-08-12 Thread Sigalit Eliazov
Hi, We are exploring options to support canary upgrades for our Flink pipelines and have considered the following approach. 1. *Stateless Pipelines*: For stateless pipelines, we define two clusters, job-v1 and job-v2, using the same consumer group to avoid event duplication. To contr

Re: Flink pipeline throughput

2024-04-01 Thread Asimansu Bera
Hello Kartik, For your case, if events ingested/Second is 300/60=5 and payload size is 2kb , per second, ingestion size 5*2k=10kb. Network buffer size is 32kb by default. You can also decrease the value to 16k. https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/config/#tas

Re: Flink pipeline throughput

2024-04-01 Thread Kartik Kushwaha
Thank you. I will check and get back on both the sugesstions made by Asimansu and Xuyang. I am using Flink 1.17.0 Regards, Kartik On Mon, Apr 1, 2024, 5:13 AM Asimansu Bera wrote: > Hello Karthik, > > You may check the execution-buffer-timeout-interval parameter. This value > is an important o

Re: Flink pipeline throughput

2024-03-31 Thread Asimansu Bera
Hello Karthik, You may check the execution-buffer-timeout-interval parameter. This value is an important one for your case. I had a similar issue experienced in the past. https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#execution-buffer-timeout-interval For your

Flink pipeline throughput

2024-03-30 Thread Kartik Kushwaha
Hello, I have a Streaming event processing job that looks like this. *Source - ProcessFn(3 in total) - Sink* I am seeing a delay of 50ms to 250ms between each operators (maybe buffering or serde delays) leading to a slow end- to-end processing. What could be the reason for such high latency? So

Re: flink pipeline handles very small amount of messages in a second (only 500)

2022-04-07 Thread yu'an huang
Hi Sigalit, In your settings, I guess each job will only have one slot (parallelism). So is it too many input for your jobs with parallelism only one? One easy way to confirm is that you increase your slots and job parallelism twice and then see whether the QPS is increased. Hope this would

Re: flink pipeline handles very small amount of messages in a second (only 500)

2022-04-07 Thread Yun Tang
Sent: Thursday, April 7, 2022 19:12 To: user Subject: flink pipeline handles very small amount of messages in a second (only 500) hi all I would appreciate some help to understand the pipeline behaviour... We deployed a standalone flink cluster. The pipelines are deployed via the jm rest api. We

flink pipeline handles very small amount of messages in a second (only 500)

2022-04-07 Thread Sigalit Eliazov
hi all I would appreciate some help to understand the pipeline behaviour... We deployed a standalone flink cluster. The pipelines are deployed via the jm rest api. We have 5 task managers with 1 slot each. In total i am deploying 5 pipelines which mainly read from kafka, a simple object conversi

Observability around Flink Pipeline/stateful functions

2021-07-22 Thread Deepak Sharma
@d...@spark.apache.org @user I am looking for an example around the observability framework for Apache Flink pipelines. This could be message tracing across multiple flink pipelines or query on the past state of a message that was processed by any flink pipeline. If anyone has done similar work

Re: [QUERY] Multiple elastic search sinks for a single flink pipeline

2020-10-14 Thread Chesnay Schepler
Are the number of sinks fixed? If so, then you can just take the output of your map function and apply multiple filters, writing the output of each filter into a sync. You could also use a process function with side-outputs, and apply a source to each output. On 10/14/2020 6:05 PM, Vignesh Ram

[QUERY] Multiple elastic search sinks for a single flink pipeline

2020-10-14 Thread Vignesh Ramesh
My requirement is to send the data to a different ES sink (based on the data). Ex: If the data contains a particular info send it to sink1 else send it to sink2 etc(basically send it dynamically to any one sink based on the data). I also want to set parallelism separately for ES sink1, ES sink2, Es

Re: Flink pipeline;

2020-05-06 Thread Leonard Xu
Hi Aissa Looks like your requirements is to enrich a real stream data(from kafka) with dimension data(your case will like: {sensor_id, equipment_id, workshop_id, factory_id} ), you can achieve your purpose by Flink DataStream API or just use FLINK SQL. I think use pure SQL will be esaier if you

Re: Flink pipeline;

2020-05-06 Thread hemant singh
You will have to enrich the data coming in for eg- { "equipment-id" : "1-234", "sensor-id" : "1-vcy", . } . Since you will most likely have a keyedstream based on equipment-id+sensor-id or equipment-id, you can have a control stream with data about equipment to workshop/factory mapping somethin

Flink pipeline;

2020-05-05 Thread Aissa Elaffani
Hello Guys, I am new to the real-time streaming field, and I am trying to build a BIG DATA architecture for processing real-time streaming. I have some sensors that generate data in json format, they are sent to Apache kafka cluster then i want to consume them with Apache flinkin ordre to do some a

Re: Using Stateful Functions within a Flink pipeline

2020-04-30 Thread Annemarie Burger
Hi Igal, Thanks for your responses. Regarding "having a pre-processing job that does whatever transformations necessary with the DataStream and outputs to Kafka / Kinesis, and then having a separate StateFun deployment that consumes from that transformed Kafka / Kinesis topic." I was wondering how

Re: Using Stateful Functions within a Flink pipeline

2020-04-26 Thread Igal Shilman
Hi Max, Sorry for not being clearer earlier, by splitting the pipeline I mean: having a pre-processing job that does whatever transformations necessary with the DataStream and outputs to Kafka / Kinesis, and then having a separate StateFun deployment that consumes from that transformed Kafka / Kin

Re: Using Stateful Functions within a Flink pipeline

2020-04-26 Thread m@xi
Dear Igal, Can you elaborate more on your proposed solution of splitting the pipeline? If possible, providing some skeleton pseudocode would be awesome! Thanks in advance. Best, Max -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: How to scale a streaming Flink pipeline without abusing parallelism for long computation tasks?

2020-04-23 Thread Arvid Heise
gt; Best regards > Theo > > [1] > https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/asyncio.html > > -- > *Von: *"Elkhan Dadashov" > *An: *"user" > *Gesendet: *Donnerstag, 16. April 2020 10:37:55 >

Re: Using Stateful Functions within a Flink pipeline

2020-04-22 Thread Oytun Tez
ctions job. > > Good luck, > Igal. > > On Wed, Apr 22, 2020 at 4:26 PM Annemarie Burger < > annemarie.bur...@campus.tu-berlin.de> wrote: > >> I was wondering if it is possible to use a Stateful Function within a >> Flink >> pipeline. I know they work with

Re: Using Stateful Functions within a Flink pipeline

2020-04-22 Thread Igal Shilman
: > I was wondering if it is possible to use a Stateful Function within a Flink > pipeline. I know they work with different API's, so I was wondering if it > is > possible to have a DataStream as ingress for a Stateful Function. > > Some context: I'm working on a streamin

Re: Using Stateful Functions within a Flink pipeline

2020-04-22 Thread Oytun Tez
t 10:26 AM Annemarie Burger < annemarie.bur...@campus.tu-berlin.de> wrote: > I was wondering if it is possible to use a Stateful Function within a Flink > pipeline. I know they work with different API's, so I was wondering if it > is > possible to have a DataStream as ingress

Using Stateful Functions within a Flink pipeline

2020-04-22 Thread Annemarie Burger
I was wondering if it is possible to use a Stateful Function within a Flink pipeline. I know they work with different API's, so I was wondering if it is possible to have a DataStream as ingress for a Stateful Function. Some context: I'm working on a streaming graph analytics system, a

Re: How to scale a streaming Flink pipeline without abusing parallelism for long computation tasks?

2020-04-16 Thread Theo Diefenthal
16. April 2020 10:37:55 Betreff: How to scale a streaming Flink pipeline without abusing parallelism for long computation tasks? Hi Flink users, I have a basic Flnk pipeline, doing flatmap. inside flatmap, I get the input, path it to the client library to compute some result. That library ex

How to scale a streaming Flink pipeline without abusing parallelism for long computation tasks?

2020-04-16 Thread Elkhan Dadashov
Hi Flink users, I have a basic Flnk pipeline, doing flatmap. inside flatmap, I get the input, path it to the client library to compute some result. That library execution takes around 30 seconds to 2 minutes (depending on the input ) for producing the output from the given input ( it is time-ser

Re: Job recovers from an old dangling CheckPoint in case of Job Cluster based Flink pipeline

2019-07-03 Thread Vishal Santoshi
I guess using a session cluster rather then a job cluster will decouple the job from the container and may be the only option as of today? On Sat, Jun 29, 2019, 9:34 AM Vishal Santoshi wrote: > So there a re 2 scenerios > > 1. If JM goes down ( exits ) and k8s re launches the Job Cluster ( the J

Re: Job recovers from an old dangling CheckPoint in case of Job Cluster based Flink pipeline

2019-06-29 Thread Vishal Santoshi
So there a re 2 scenerios 1. If JM goes down ( exits ) and k8s re launches the Job Cluster ( the JM ), it is using the exact command, the JM was brought up in the first place. 2. If the pipe is restarted for any other reason by the JM ( the JM has not exited but *Job Kafka-to-HDFS (

Re: Job recovers from an old dangling CheckPoint in case of Job Cluster based Flink pipeline

2019-06-29 Thread Vishal Santoshi
Another point the JM had terminated. The policy on K8s for Job Cluster is spec: restartPolicy: OnFailure *2019-06-29 00:33:14,308 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Terminating cluster entrypoint process StandaloneJobClusterEntryPoint with exit code 1443.* On Sat,

Re: Job recovers from an old dangling CheckPoint in case of Job Cluster based Flink pipeline

2019-06-29 Thread Vishal Santoshi
I have not tried on bare metal. We have no option but k8s. And this is a job cluster. On Sat, Jun 29, 2019 at 9:01 AM Timothy Victor wrote: > Hi Vishal, can this be reproduced on a bare metal instance as well? Also > is this a job or a session cluster? > > Thanks > > Tim > > On Sat, Jun 29, 2

Re: Job recovers from an old dangling CheckPoint in case of Job Cluster based Flink pipeline

2019-06-29 Thread Timothy Victor
Hi Vishal, can this be reproduced on a bare metal instance as well? Also is this a job or a session cluster? Thanks Tim On Sat, Jun 29, 2019, 7:50 AM Vishal Santoshi wrote: > OK this happened again and it is bizarre ( and is definitely not what I > think should happen ) > > > > > The job fai

Re: Job recovers from an old dangling CheckPoint in case of Job Cluster based Flink pipeline

2019-06-29 Thread Vishal Santoshi
OK this happened again and it is bizarre ( and is definitely not what I think should happen ) The job failed and I see these logs ( In essence it is keeping the last 5 externalized checkpoints ) but deleting the zk checkpoints directory *06.28.2019 20:33:13.7382019-06-29 00:33:13,73

Re: Job recovers from an old dangling CheckPoint in case of Job Cluster based Flink pipeline

2019-06-05 Thread Vishal Santoshi
Ok, I will do that. On Wed, Jun 5, 2019, 8:25 AM Chesnay Schepler wrote: > Can you provide us the jobmanager logs? > > After the first restart the JM should have started deleting older > checkpoints as new ones were created. > After the second restart the JM should have recovered all 10 checkpoi

Re: Job recovers from an old dangling CheckPoint in case of Job Cluster based Flink pipeline

2019-06-05 Thread Chesnay Schepler
Can you provide us the jobmanager logs? After the first restart the JM should have started deleting older checkpoints as new ones were created. After the second restart the JM should have recovered all 10 checkpoints, start from the latest, and start pruning old ones as new ones were created.

Re: Job recovers from an old dangling CheckPoint in case of Job Cluster based Flink pipeline

2019-06-05 Thread Vishal Santoshi
Any one? On Tue, Jun 4, 2019, 2:41 PM Vishal Santoshi wrote: > The above is flink 1.8 > > On Tue, Jun 4, 2019 at 12:32 PM Vishal Santoshi > wrote: > >> I had a sequence of events that created this issue. >> >> * I started a job and I had the state.checkpoints.num-retained: 5 >> >> * As expected

Re: Job recovers from an old dangling CheckPoint in case of Job Cluster based Flink pipeline

2019-06-04 Thread Vishal Santoshi
The above is flink 1.8 On Tue, Jun 4, 2019 at 12:32 PM Vishal Santoshi wrote: > I had a sequence of events that created this issue. > > * I started a job and I had the state.checkpoints.num-retained: 5 > > * As expected I have 5 latest checkpoints retained in my hdfs backend. > > > * JM dies ( K

Job recovers from an old dangling CheckPoint in case of Job Cluster based Flink pipeline

2019-06-04 Thread Vishal Santoshi
I had a sequence of events that created this issue. * I started a job and I had the state.checkpoints.num-retained: 5 * As expected I have 5 latest checkpoints retained in my hdfs backend. * JM dies ( K8s limit etc ) without cleaning the hdfs directory. The k8s job restores from the latest che

Re: Flink Pipeline - CICD

2019-04-08 Thread Lifei Chen
There is a go cli for automating deploying and udpating flink jobs, you can integrate Jenkins pipeline with it, maybe it helps. https://github.com/ing-bank/flink-deployer Navneeth Krishnan 于2019年4月9日周二 上午10:34写道: > Hi All, > > We have some streaming jobs in production and today we manually de

Flink Pipeline - CICD

2019-04-08 Thread Navneeth Krishnan
Hi All, We have some streaming jobs in production and today we manually deploy the flink jobs in each region/environment. Before we start automating it I just wanted to check if anyone has already created a CICD script for Jenkins or other CICD tools to deploy the latest JAR on to running flink cl