from:"Eleanore Jin"

How to examine Flink state in RocksDB with sst_dump

2021-07-13 Thread Eleanore Jin

Hi experts, I am running the flink application as local execution mode for testing. I have configured RocksDB as state backend, and I would like to use rocksDB tools such as ldb or sst_dump to examine how exactly the state is stored. However, I encountered below error, can you please advice me h

Flink sql using Hive for metastore throws Exception

2021-02-01 Thread Eleanore Jin

Hi experts, I am trying to experiment how to use Hive to store metadata along using Flink SQL. I am running Hive inside a docker container locally, and running Flink SQL program through IDE. Flink version 1.12.0 the sample code looks like: StreamExecutionEnvironment bsEnv = StreamExecutionEnviro

question regarding flink local buffer pool

2021-01-05 Thread Eleanore Jin

Hi experts, I am running flink 1.10, the flink job is stateless. I am trying to understand how local buffer pool works: 1. lets say taskA and taskB both run in the same TM JVM, each task will have its own local buffer pool, and taskA will write to pool-A, and taskB will read from pool-A and write

Re: pause and resume flink stream job based on certain condition

2021-01-04 Thread Eleanore Jin

the Kafka source > won't read new messages. > > On Tue, Dec 15, 2020 at 3:07 AM Eleanore Jin > wrote: > >> Hi Guowei and Arvid, >> >> Thanks for the suggestion. I wonder if it makes sense and possible that >> the operator will produce a side output messa

Re: pause and resume flink stream job based on certain condition

2020-12-14 Thread Eleanore Jin

> >> 1. AFAIK I think only the job could "pause" itself. For example the >> "query" external system could pause when the external system is down. >> 2. Maybe you could try the "iterate" and send the failed message back to >> retry if y

pause and resume flink stream job based on certain condition

2020-11-29 Thread Eleanore Jin

Hi experts, Here is my use case, it's a flink stateless streaming job for message validation. 1. read from a kafka topic 2. perform validation of message, which requires query external system 2a. the metadata from the external system will be cached in memory for 15minutes 2b. there i

Re: Flink Job Manager Memory Usage Keeps on growing when enabled checkpoint

2020-11-16 Thread Eleanore Jin

Maybe they contain some > more information. > > Cheers, > Till > > On Sat, Oct 24, 2020 at 12:24 AM Eleanore Jin > wrote: > >> I also tried enable native memory tracking, via jcmd, here is the memory >> breakdown: https://ibb.co/ssrZB4F >> >> since job

Re: Flink Job Manager Memory Usage Keeps on growing when enabled checkpoint

2020-10-23 Thread Eleanore Jin

-XX:MaxMetaspaceSize for job manager? And any recommendations? Thanks a lot! Eleanore On Fri, Oct 23, 2020 at 9:28 AM Eleanore Jin wrote: > Hi Till, > > please see the screenshot of heap dump: https://ibb.co/92Hzrpr > > Thanks! > Eleanore > > On Fri, Oct 23, 2020 at 9:25 AM Eleanore Ji

Re: Flink Job Manager Memory Usage Keeps on growing when enabled checkpoint

2020-10-23 Thread Eleanore Jin

Hi Till, please see the screenshot of heap dump: https://ibb.co/92Hzrpr Thanks! Eleanore On Fri, Oct 23, 2020 at 9:25 AM Eleanore Jin wrote: > Hi Till, > Thanks a lot for the prompt response, please see below information. > > 1. how much memory assign to JM pod? > 6g for c

Re: Flink Job Manager Memory Usage Keeps on growing when enabled checkpoint

2020-10-23 Thread Eleanore Jin

; In order to better understand the problem, the debug logs of your JM could > be helpful. Also a heap dump might be able to point us towards the > component which is eating up so much memory. > > Cheers, > Till > > On Thu, Oct 22, 2020 at 4:56 AM Eleanore Jin > wrote: > >

Re: expected behavior when Flink job cluster exhausted all restarts

2020-10-23 Thread Eleanore Jin

ion of a new job. > > In order to tell you more about the Task submission rejection by the > TaskExecutor, I would need to take a look at the logs of the JM and the > rejecting TaskExecutor. Moreover, which Flink version are you using? > > Cheers, > Till > > On Wed, Oct 21

Flink Job Manager Memory Usage Keeps on growing when enabled checkpoint

2020-10-21 Thread Eleanore Jin

Hi all, I have a flink job running version 1.10.2, it simply read from a kafka topic with 96 partitions, and output to another kafka topic. It is running in k8s, with 1 JM (not in HA mode), 12 task managers each has 4 slots. The checkpoint persists the snapshot to azure blob storage, checkpoints

expected behavior when Flink job cluster exhausted all restarts

2020-10-21 Thread Eleanore Jin

Hi experts, I am running a flink job cluster, the application jar is packaged together with flink in a docker image. The flink job cluster is running in kubernetes, the restart strategy is below restart-strategy: failure-rate restart-strategy.failure-rate.max-failures-per-interval: 20 restart-stra

Re: Checkpoint dir is not cleaned up after cancel the job with monitoring API

2020-10-02 Thread Eleanore Jin

27/2020 7:02 PM, Eleanore Jin wrote: > > I have noticed this: if I have Thread.sleep(1500); after the patch call > returned 202, then the directory gets cleaned up, in the meanwhile, it > shows the job-manager pod is in completed state before getting terminated: > see screenshot: htt

Re: Checkpoint dir is not cleaned up after cancel the job with monitoring API

2020-09-27 Thread Eleanore Jin

the job? Is there a way to check if cancel is completed? So that the stop tm and jm can be called afterwards? Thanks a lot! Eleanore On Sun, Sep 27, 2020 at 9:37 AM Eleanore Jin wrote: > Hi Congxian, > I am making rest call to get the checkpoint config: curl -X GET \ > > http://lo

Re: Checkpoint dir is not cleaned up after cancel the job with monitoring API

2020-09-27 Thread Eleanore Jin

NCELLATION` is set, then the > checkpoint will be kept when canceling a job. > > PS the image did not show > > [1] > https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/checkpoints.html#retained-checkpoints > Best, > Congxian > > > Eleanore Jin 于2020年9月27日

Checkpoint dir is not cleaned up after cancel the job with monitoring API

2020-09-26 Thread Eleanore Jin

Hi experts, I am running flink 1.10.2 on kubernetes as per job cluster. Checkpoint is enabled, with interval 3s, minimumPause 1s, timeout 10s. I'm using FsStateBackend, snapshots are persisted to azure blob storage (Microsoft cloud storage service). Checkpointed state is just source kafka topic o

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-06 Thread Eleanore Jin

eporter > [2]. > https://ci.apache.org/projects/flink/flink-docs-master/monitoring/metrics.html#availability > > Best, > Yang > > Eleanore Jin 于2020年8月5日周三下午11:52写道： > >> Hi Yang and Till, >> >> Thanks a lot for the help! I have the similar question as Till

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-05 Thread Eleanore Jin

set the resartPolicy and >>>> backoffLimit, >>>> this is not a clean and correct way to go. We should terminate the >>>> jobmanager process with zero exit code in such situation. >>>> >>>> @Till Rohrmann I just have one concern. Is it a

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-04 Thread Eleanore Jin

t .spec.template.spec.restartPolicy = "Never" and >>> spec.backoffLimit = 0. >>> Refer here[1] for more information. >>> >>> Then, when the jobmanager failed because of any reason, the K8s job will >>> be marked failed. And K8s will not restart the

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-03 Thread Eleanore Jin

clusterframework/ApplicationStatus.java#L32 > > On Fri, Jul 31, 2020 at 6:26 PM Eleanore Jin > wrote: > >> Hi Experts, >> >> I have a flink cluster (per job mode) running on kubernetes. The job is >> configured with restart strategy >> >> restart-strate

Behavior for flink job running on K8S failed after restart strategy exhausted

2020-07-31 Thread Eleanore Jin

Hi Experts, I have a flink cluster (per job mode) running on kubernetes. The job is configured with restart strategy restart-strategy.fixed-delay.attempts: 3restart-strategy.fixed-delay.delay: 10 s So after 3 times retry, the job will be marked as FAILED, hence the pods are not running. However

Re: Recommended pattern for implementing a DLQ with Flink+Kafka

2020-07-22 Thread Eleanore Jin

+1 we have a similar use case for message schema validation. Eleanore On Wed, Jul 22, 2020 at 4:12 AM Tom Fennelly wrote: > Hi. > > I've been searching blogs etc trying to see if there are > established patterns/mechanisms for reprocessing of failed messages via > something like a DLQ. I've rea

Re: Developing Beam applications using Flink checkpoints

2020-05-20 Thread Eleanore Jin

Hi Ivan, Beam coders are wrapped in Flink's TypeSerializers. So I don't think it will result in double serialization. Thanks! Eleanore On Tue, May 19, 2020 at 4:04 AM Ivan San Jose wrote: > Perfect, thank you so much Arvid, I'd expect more people using Beam on > top of Flink, but it seems is n

Re: Export user metrics with Flink Prometheus endpoint

2020-05-18 Thread Eleanore Jin

ork. Maybe have a look at the > prometheus exporter [1] and go from there. The entrypoint for a Flink > exporter would then probably be > > FlinkStatsCollector.createAndRegister(getMetricGroup()); > > Best, > Aljoscha > > On 06.05.20 17:11, Eleanore Jin wrote: > > Hi Aljo

Re: run flink on edge vs hub

2020-05-18 Thread Eleanore Jin

8284.n3.nabble.com/DISCUSS-ARM-support-for-Flink-td30298.html > [2] > https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/datastream_api.html#data-sources > [3] > https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/queryable_state.html > [4] > https:

run flink on edge vs hub

2020-05-17 Thread Eleanore Jin

Hi Community, Currently we are running flink in 'hub' data centers where data is ingested into the platform via kafka, and flink job will read from kafka, do the transformations, and publish to another kafka topic. I would also like to see if the same logic (read input message -> do transformatio

Re: Export user metrics with Flink Prometheus endpoint

2020-05-06 Thread Eleanore Jin

eporter: > > https://ci.apache.org/projects/flink/flink-docs-master/monitoring/metrics.html#prometheus-orgapacheflinkmetricsprometheusprometheusreporter > > Best, > Aljoscha > > On 06.05.20 03:02, Eleanore Jin wrote: > > Hi all, > > > > I just wonder is it possible to use

Export user metrics with Flink Prometheus endpoint

2020-05-05 Thread Eleanore Jin

Hi all, I just wonder is it possible to use Flink Metrics endpoint to allow Prometheus to scrape user defined metrics? Context: In addition to Flink metrics, we also collect some application level metrics using opencensus. And we run opencensus agent as side car in kubernetes pod to collect metri

Flink Task Manager GC overhead limit exceeded

2020-04-29 Thread Eleanore Jin

Hi All, Currently I am running a flink job cluster (v1.8.2) on kubernetes with 4 pods, each pod with 4 parallelism. The flink job reads from a source topic with 96 partitions, and does per element filter, the filtered value comes from a broadcast topic and it always use the latest message as the

Re: Streaming Job eventually begins failing during checkpointing

2020-04-25 Thread Eleanore Jin

Hi All, I think the Beam Community fixed this issue: https://github.com/apache/beam/pull/11478 Thanks! Eleanore On Thu, Apr 23, 2020 at 4:24 AM Stephan Ewen wrote: > If something requires Beam to register a new state each time, then this is > tricky, because currently you cannot unregister sta

Re: Cannot register for Flink Forward Conference

2020-04-20 Thread Eleanore Jin

gt; On Mon, Apr 20, 2020 at 1:39 PM Eleanore Jin > wrote: > >> Hi community, >> >> My colleagues tried to register for the Flink forward conference: >> https://www.bigmarker.com/series/flink-forward-virtual-confer1/series_summit?__hssc=37051071.2.15874077

Cannot register for Flink Forward Conference

2020-04-20 Thread Eleanore Jin

Hi community, My colleagues tried to register for the Flink forward conference: https://www.bigmarker.com/series/flink-forward-virtual-confer1/series_summit?__hssc=37051071.2.1587407738905&__hstc=37051071.db5b837955b42a71990541edc07d7beb.1587407738904.1587407738904.1587407738904.1&hsCtaTracking=33

Start flink job from the latest checkpoint programmatically

2020-03-12 Thread Eleanore Jin

Hi All, The setup of my flink application is allow user to start and stop. The Flink job is running in job cluster (application jar is available to flink upon startup). When stop a running application, it means exit the program. When restart a stopped job, it means to spin up new job cluster wit

Re: scaling issue Running Flink on Kubernetes

2020-03-11 Thread Eleanore Jin

for you, who >>> is one of the community experts in this area. >>> >>> Thank you~ >>> >>> Xintong Song >>> >>> >>> >>> On Wed, Mar 11, 2020 at 10:37 AM Eleanore Jin >>> wrote: >>> >>>

Re: scaling issue Running Flink on Kubernetes

2020-03-10 Thread Eleanore Jin

g., the existing TMs might have some file already localized, > or some memory buffers already promoted to the JVM tenured area, while the > new TMs have not. > > Thank you~ > > Xintong Song > > > > On Wed, Mar 11, 2020 at 9:25 AM Eleanore Jin > wrote: > >> Hi E

scaling issue Running Flink on Kubernetes

2020-03-10 Thread Eleanore Jin

Hi Experts, I have my flink application running on Kubernetes, initially with 1 Job Manager, and 2 Task Managers. Then we have the custom operator that watches for the CRD, when the CRD replicas changed, it will patch the Flink Job Manager deployment parallelism and max parallelism according to th

Re: Is incremental checkpoints needed?

2020-03-10 Thread Eleanore Jin

; > On Tue, Mar 10, 2020 at 5:43 PM Eleanore Jin > wrote: > >> Hi All, >> >> I am using Apache Beam to construct the pipeline, and this pipeline is >> running with Flink Runner. >> >> Both Source and Sink are Kafka topics, I have enabled Beam Exactly onc

Is incremental checkpoints needed?

2020-03-10 Thread Eleanore Jin

Hi All, I am using Apache Beam to construct the pipeline, and this pipeline is running with Flink Runner. Both Source and Sink are Kafka topics, I have enabled Beam Exactly once semantics. I believe how it works in beam is: the messages will be cached and not processed by the KafkaExactlyOnceSin

Re: How to test flink job recover from checkpoint

2020-03-04 Thread Eleanore Jin

if (SOME_CONDITION) { > > throw new RuntimeException("Lets test checkpointing"); > > } > > return value; > > } > > }); > > > > ~ Abhinav Bajaj > > > > > > *From: *Eleanor

How to test flink job recover from checkpoint

2020-03-04 Thread Eleanore Jin

Hi, I have a flink application and checkpoint is enabled, I am running locally using miniCluster. I just wonder if there is a way to simulate the failure, and verify that flink job restarts from checkpoint? Thanks a lot! Eleanore

41 matches

Mail list logo