yaml and is more friendly to K8s users.
>
> The flink-k8s-operator could integrate with standalone mode[1], but also
> native K8s mode[2].
>
> [1]. https://github.com/GoogleCloudPlatform/flink-on-k8s-operator
> [2]. https://github.com/wangyang0918/flink-native-k8s-operator
>
&g
In HA mode the configMap will be retained after deletion of the deployment:
https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/
( Refer High availability data clean up)
On Fri, Oct 22, 2021 at 8:13 AM Yangze Guo wrote:
> For application mode, when the jo
Understood that we have kubernetes HA configuration where we specify s3://
or HDFS:/// persistent storage, as mentioned here:
https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/
Regards
Bhaskar
On Fri, Oct 22, 2021 at 10:47 AM Vijay Bhaskar
wrote:
>
All,
I have used flink upto last year using flank 1.9. That time we built our
own cluster using zookeeper and monitoring jobs. Now I am revisiting
different applications. Found that community has come up with this native
kubernetes deployment:
https://ci.apache.org/projects/flink/flink-docs-mast
Can we register any method in Custom Source and Custom Sink to catch
SIGINIT? (I know that we need to follow procedure to stop a flink job by
saving the state.)
Regards
Bhaskar
Since state size is small, you can try FileState Backend, rather than
RocksDB. You can check once. Thumb rule is if FileStateBackend Performs
worse, RocksDB is good.
Regards
Bhasakar
On Tue, Oct 12, 2021 at 1:47 PM Yun Tang wrote:
> Hi Lei,
>
> RocksDB state-backend's checkpoint is composited b
Yes absolutely. Unless we need a very large state order of GB rocks DB is
not required. RocksDB is good only because the Filesystem is very bad at
LargeState. In other words FileSystem performs much better than RocksDB
upto GB's. After that the file system degrades compared to RocksDB. Its not
that
Hi
While doing scale testing we observed that FSStatebackend is out
performing RocksDB.
When using RocksDB, off heap memory keeps growing over a period of time
and after a day pod got terminated with OOM.
Whereas the same data pattern FSStatebackend is running for days without
any memory spike an
t; }
> > isSkippedElement = false;
> >
> > evictingWindowState.setCurrentNamespace(window);
> > evictingWindowState.add(element);
> >
> > t
h zero.
> >>
> >> The application I am replacing has a latency of 36-48 hours, so if I
> >> had
> >> to fully stop processing to take every snapshot synchronously, it
> >> might
> >> be seen as totally acceptable, especially for initial bo
by much.
>
> On first glance, the code change to allow RocksDBStateBackend into a
> synchronous snapshots mode looks pretty easy. Nevertheless, I was
> hoping to do the initial launch of my application without needing to
> modify the framework.
>
> Regards,
>
>
> Jeff He
For me this seems to be an IO bottleneck at your task manager.
I have a couple of queries:
1. What's your checkpoint interval?
2. How frequently are you updating the state into RocksDB?
3. How many task managers are you using?
4. How much data each task manager handles while taking the checkpoint?
Hi
I am using flink 1.9 and facing the below issue
Suppose i have deployed any job and in case there are not enough slots,
then the job is stuck in waiting for slots. But flink job status is showing
it as "RUNNING" actually it's not.
For me this is looking like a bug. It impacts our production whi
special job id.
> If you stick to use this, all runs of this jobgraph would have the same
> job id.
>
>
> Best
> Yun Tang
> --
> *From:* Vijay Bhaskar
> *Sent:* Monday, June 8, 2020 12:42
> *To:* Yun Tang
> *Cc:* Kathula, Sandeep ; us
Hi Yun
If we start using the special Job ID and redeploy the job, then after
deployment, will it going to get assigned with special Job ID? or new Job
ID?
Regards
Bhaskar
On Mon, Jun 8, 2020 at 9:33 AM Yun Tang wrote:
> Hi Sandeep
>
> In general, Flink assign unique job-id to each job and use
Created JIRA for it: https://issues.apache.org/jira/browse/FLINK-17966
Regards
Bhaskar
On Wed, May 27, 2020 at 1:28 PM Vijay Bhaskar
wrote:
> Thanks Yun. In that case it would be good to give the reference of that
> documentation in the Flink Rest API:
> https://ci.apache.org/proje
/blob/master/docs/monitoring/checkpoint_monitoring.zh.md
>
> Best
> Yun Tang
> --
> *From:* Vijay Bhaskar
> *Sent:* Tuesday, May 26, 2020 15:18
> *To:* Yun Tang
> *Cc:* user
> *Subject:* Re: In consistent Check point API response
>
> Thank
Hi
We are using restart strategy of fixed delay.
I have fundamental question:
Why the reset counter is not zero after streaming job restart is
successful?
Let's say I have number of restarts max are: 5
My streaming job tried 2 times and 3'rd attempt its successful, why counter
is still 2 but not ze
checkpoint, they are actually not the same thing.
> The only scenario that they're the same in numbers is when Flink just
> restore successfully before a new checkpoint completes.
>
> Best
> Yun Tang
>
>
> --
> *From:* Vijay Bhaska
b.com/apache/flink/blob/8f992e8e868b846cf7fe8de23923358fc6b50721/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java#L1250
>
> Best
> Yun Tang
> --
> *From:* Vijay Bhaskar
> *Sent:* Monday, May 25, 2020 17:01
> *To:
t; savepoint/checkpoint is restored. [1]
>
> [1]
> https://github.com/apache/flink/blob/50253c6b89e3c92cac23edda6556770a63643c90/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java#L1285
>
> Best
> Yun Tang
>
> --
Hi
I am using flink retained check points and along with
jobs/:jobid/checkpoints API for retrieving the latest retained check point
Following the response of Flink Checkpoints API:
I have my jobs restart attempts are 5
check point API response in "latest" key, check point file name of both
"rest
For point (1) above:
Its up to user to have proper sink and source to choose to have exactly
once semantics as per the documentation:
https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/guarantees.html
If we choose the supported source and sink combinations duplicates will be
avoi
Please find answers inline
Our understanding is to stop job with savepoint, all the task manager
will persist their state during savepoint. If a Task Manager receives a
shutdown signal while savepoint is being taken, does it complete the
savepoint before shutdown ?
[Ans ] Why task manager is shutd
2 things you can do,
stop flink job is going to generate savepoint.
You need to save the save point directory path in some persistent store
(because you are restarting the cluster, otherwise checkpoint monitoring
api should give you save point file details)
After spinning the cluster read the path
> -- 原始邮件 --
> *发件人:* "vino yang";
> *发送时间:* 2019年11月28日(星期四) 晚上7:17
> *收件人:* "曾祥才";
> *抄送:* "Vijay Bhaskar";"User-Flink"<
> user@flink.apache.org>;
> *主题:* Re: JobGraphs not cleaned up in HA mode
>
> Hi,
>
>
--
> *发件人:* "Vijay Bhaskar";
> *发送时间:* 2019年11月28日(星期四) 下午3:05
> *收件人:* "曾祥才";
> *主题:* Re: JobGraphs not cleaned up in HA mode
>
> Again it could not find the state store file: "Caused by:
> java.io.FileNotFoundException: /flink/ha/submittedJob
is a external persistent store (a nas directory mounts
> to the job manager)
>
>
>
>
> -- 原始邮件 ----------
> *发件人:* "Vijay Bhaskar";
> *发送时间:* 2019年11月28日(星期四) 下午2:29
> *收件人:* "曾祥才";
> *抄送:* "user";
> *主题:* Re: JobGraphs n
Following are the mandatory condition to run in HA:
a) You should have persistent common external store for jobmanager and task
managers to while writing the state
b) You should have persistent external store for zookeeper to store the
Jobgraph.
Zookeeper is referring path:
/flink/checkpoints/su
d by the community.
>
> Cheers,
> Till
>
> On Fri, Oct 11, 2019 at 11:39 AM Vijay Bhaskar
> wrote:
>
>> Apart from these we have other environment and there check point worked
>> fine in HA mode with complete cluster restart. But one of the job we are
>> se
using flink 1.6.2 in
production. Is this an issue already known before and fixed recently
Regards
Bhaskar
On Fri, Oct 11, 2019 at 2:08 PM Vijay Bhaskar
wrote:
> We are seeing below logs in production sometime ago, after that we stopped
> HA. Do you people think HA is enabled properly fr
u want to change your job topology, just follow
> the general rule to restore from savepoint/checkpoint, do not rely on HA to
> do job migration things.
>
> Best
> Yun Tang
> --
> *From:* Hao Sun
> *Sent:* Friday, October 11, 2019 8:33
> *To:* Yun
ter-id, it could
> recover jobs and checkpoint from zookeeper. I think it has been supported
> for a long time. Maybe you could have a
> try with flink-1.8 or 1.9.
>
> Best,
> Yang
>
>
> Vijay Bhaskar 于2019年10月10日周四 下午2:26写道:
>
>> Thanks Yang and Sean. I ha
gt;> promising enough option that we're going to run in HA for a month or two
>> and monitor results before we put in any extra work to customize the
>> savepoint start-up behavior.
>>
>> On Fri, Sep 27, 2019 at 2:24 AM Vijay Bhaskar
>> wrote:
>>
>>
I don't think HA will help to recover from cluster crash, for that we
should take periodic savepoint right? Please correct me in case i am wrong
Regards
Bhaskar
On Fri, Sep 27, 2019 at 11:48 AM Vijay Bhaskar
wrote:
> Suppose my cluster got crashed and need to bring up the entire cluste
Suppose my cluster got crashed and need to bring up the entire cluster
back? Does HA still helps to run the cluster from latest save point?
Regards
Bhaskar
On Thu, Sep 26, 2019 at 7:44 PM Sean Hester
wrote:
> thanks to everyone for all the replies.
>
> i think the original concern here with "ju
One of the way you should do is, have a separate cluster job manager
program in kubernetes, which is actually managing jobs. So that you can
decouple the job control. While restarting the job, make sure to follow the
below steps:
a) First job manager takes save point by killing the job and notes d
One more suggestion is to run the same job in regular 2 node cluster and
see whether you are getting the same exception. So that you can narrow down
the issue easily.
Regards
Bhaskar
On Mon, Sep 23, 2019 at 7:50 AM Zili Chen wrote:
> Hi Debasish,
>
> As mentioned by Dian, it is an internal e
Can you check whether its able to read the supplied input file properly or
not?
Regards
Bhaskar
On Wed, Sep 18, 2019 at 1:07 PM RAMALINGESWARA RAO THOTTEMPUDI <
tr...@iitkgp.ac.in> wrote:
> Hi Sir,
>
> I am trying to run the flink programs particularl Pagerank.
>
> I have used the following com
= overrideConfigFromArgs.withFallBack(defaultConfig)
//This is going to override your default configuration params
On Fri, Sep 13, 2019 at 11:38 AM Vijay Bhaskar
wrote:
> Hi
> You can use this way:
> Use typesafe configuration, which provides excellent configuration
> methodologies.
> You
Hi
You can use this way:
Use typesafe configuration, which provides excellent configuration
methodologies.
You supply default configuration, which is read by your application through
reference.conf file of typesafe. If you want to override any of the
defaults you can supply to command line argument
(for serialization) during checkpoints.
>
> Fabian
>
> Am Mi., 11. Sept. 2019 um 09:38 Uhr schrieb Ravi Bhushan Ratnakar <
> ravibhushanratna...@gmail.com>:
>
>> What is the upper limit of checkpoint size of Flink System?
>>
>> Regards,
>> Ravi
>>
>&g
sible
> like every 5 second.
>
> Thanks,
> Ravi
>
>
> On Tue 10 Sep, 2019, 11:37 Vijay Bhaskar,
> wrote:
>
>> For me task count seems to be huge in number with the mentioned resource
>> count. To rule out the possibility of issue with state backend can you
>>
For me task count seems to be huge in number with the mentioned resource
count. To rule out the possibility of issue with state backend can you
start writing sink data as , i.e., data ignore sink. And try
whether you could run it for longer duration without any issue. You can
start decreasing the
he.flink.runtime.entrypoint.ClusterEntrypoint -
> Terminating cluster entrypoint process StandaloneJobClusterEntryPoint with
> exit code 1444.*
>
>
> That I think is an issue. A cancelled job is a complete job and thus the
> exit code should be 0 for k8s to mark it complete.
>
>
&g
;
>
> "hdfs:///tmp/xyz14/savepoint-00-1d4f71345e22",
> "--job-classname", .
>
>
>
>- Restart
>
>kubectl create -f manifests/bf2-PRODUCTION/job-cluster-job-deployment.yaml
>
>
th cancel is a single atomic step,
> rather then a save point *followed* by a cancellation ( else why would
> that be an option ).
> Thanks again.
>
>
> On Tue, Mar 12, 2019 at 4:50 AM Vijay Bhaskar
> wrote:
>
>> Hi Vishal,
>>
>> yarn-cancel doesn't m
000 from the resource manager.
>
> 2019-03-12 08:10:44,477 INFO
> org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService -
> Stopping ZooKeeperLeaderElectionService
> ZooKeeperLeaderElectionService{leaderPath='/leader//job_manager_
Hi Vishal
Save point with cancellation internally use /cancel REST API. Which is
not stable API. It always exits with 404. Best way to issue is:
a) First issue save point REST API
b) Then issue /yarn-cancel rest API( As described in
http://mail-archives.apache.org/mod_mbox/flink-user/201804
el for logging to see if there is
> anything suspicious?
>
> Cheers,
> Gordon
>
>
> On 13 December 2018 at 7:35:33 PM, Vijay Bhaskar (bhaskar.eba...@gmail.com)
> wrote:
>
> Hi Gordon,
> We are using flink cluster 1.6.1, elastic search connector version:
> flink-connec
13 December 2018 at 5:59:34 PM, Chesnay Schepler (ches...@apache.org)
> wrote:
>
> Specifically which connector are you using, and which Flink version?
>
> On 12.12.2018 13:31, Vijay Bhaskar wrote:
> > Hi
> > We are using flink elastic sink which streams at the rate of 1000
Hi
We are using flink elastic sink which streams at the rate of 1000
events/sec, as described in
https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/elasticsearch.html
.
We are observing connection leak of elastic connections. After few minutes
all the open connections are exceedi
Can you please check the following document and verify whether you have
enough network bandwidth to support 30 seconds check point interval worth
of the streaming data?
https://data-artisans.com/blog/how-to-size-your-apache-flink-cluster-general-guidelines
Regards
Bhaskar
On Wed, Sep 19, 2018 at
s://training.data-artisans.com/
> [3]
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/process_function.html
>
> On Sep 17, 2018, at 8:50 AM, Vijay Bhaskar
> wrote:
>
> Thanks Hequn. But i want to give random TTL for each partitioned key. How
> can
Thanks Hequn. But i want to give random TTL for each partitioned key. How
can i achieve it?
Regards
Bhaskar
On Mon, Sep 17, 2018 at 7:30 AM Hequn Cheng wrote:
> Hi bhaskar,
>
> You need change nothing if you want to handle multi keys. Flink will do it
> for you. The ValueState is a keyed state.
How it would be to use ValueState with values as string separated by the
delimiter. So that order will never be a problem. Only overhead is to
separate delimiter, read the elements and convert them into primitive types
in case necessary. It just workaround. In case doesn't suite your
requirements p
56 matches
Mail list logo