AsyncFunction vs Async Sink

2023-06-14 Thread Lu Niu
Hi, Flink dev and users If I want to async write to an external service, which API shall I use, AsyncFunction or Async Sink? My understanding after checking the code are: 1. Both APIs guarantee at least once write to external service. As both API internally stores in-flight requests in the

Re: AsyncFunction vs Async Sink

2023-06-14 Thread Lu Niu
-> AsyncFunction (Updates DynamoDB) -> Kinesis Sink > > We can be sure that the updates to DynamoDB for a particular record > happens before the record is written to the Kinesis Sink. > > > Hope the above clarifies your question! > > Regards, > Hong > > > On 14 Ju

When does backpressure matter

2023-06-22 Thread Lu Niu
For example, if a flink job reads from kafka do something and writes to kafka. Do we need to take any actions when the job kafka consumer lag is low or 0 but some tasks have constant backpressure? Do we need to increase the parallelism or do some network tuning so that backpressure is constant 0? I

status of FLINK-32476

2023-12-19 Thread Lu Niu
Hi, Flink dev What's the current status of FLINK-32476? https://issues.apache.org/jira/browse/FLINK-32476 . I see this feature is deprioritized. We are interested in this feature and willing to work with the community on this if no one is actively working on it. Best Lu

Re: [DISCUSS] FLIP-329: Add operator attribute to specify support for object-reuse

2024-01-04 Thread Lu Niu
Hi, Is this still under active development? I notice https://issues.apache.org/jira/browse/FLINK-32476 is labeled as deprioritized. If this is the case, would it be acceptable for us to take on the task? Best Lu On Thu, Oct 19, 2023 at 4:26 PM Ken Krugler wrote: > Hi Dong, > > Sorry for not

Re: [DISCUSS] FLIP-329: Add operator attribute to specify support for object-reuse

2024-01-05 Thread Lu Niu
ng > thread if there is no further comment on this FLIP. > > > > Xuannan, what do you think? > > > > Thanks, > > Dong > > > > > > On Fri, Jan 5, 2024 at 2:03 AM Lu Niu wrote: > >> > >> Hi, > >> > >> Is this still und

Re: [DISCUSS] FLIP-329: Add operator attribute to specify support for object-reuse

2024-01-08 Thread Lu Niu
ting thread. WDYT? > > Best, > Xuannan > > On Sat, Jan 6, 2024 at 1:51 AM Lu Niu wrote: > > > > Thank you Dong and Xuannan! > > > > Yes. We can take on this task. Any help during bootstrapping would be > greatly appreciated! I realize there is already a

flink 1.9 Restore from a checkpoint taken in 1.11

2020-11-30 Thread Lu Niu
Hi, Flink dev Is it supported that a flink job in version 1.9 could restore from a checkpoint taken from the same job using 1.11? The context is we are migrating to version 1.11 and we need a backup plan for emergency fallback. We did a test and it throws error: ``` Caused by: org.apache.flink.run

About JobMananger metrics scope

2021-01-07 Thread Lu Niu
Hi, Flink Dev First of all, Happy New Year! I have a question about JM monitoring. According to https://ci.apache.org/projects/flink/flink-docs-stable/ops/metrics.html, metrics.scope.jm only have one variable, which seems to be not enough for YARN deployment mode: 1. The metric doesn't contain

About metric name truncation

2021-03-25 Thread Lu Niu
Hi, Flink dev https://issues.apache.org/jira/browse/FLINK-6898 truncates metric name to less than 80. We plan to relax this in our environment. Want to ask here whether it will cause any side effects? Thank you! Best Lu

Zigzag shape in TM JVM used memory

2021-04-04 Thread Lu Niu
Hi, Flink dev We observed that the TM JVM used memory metric shows zigzag shape among lots of our applications, although these applications are quite different in business logic. The upper bound is close to the max heap size. Is this expected in flink application? Or does flink internally aggressi

Re: Zigzag shape in TM JVM used memory

2021-04-05 Thread Lu Niu
Hi, we need to update our email system then :) . Here are the links: https://drive.google.com/file/d/1lZ5_P8_NqsN1JeLzutGj4DxkyWJN75mR/view?usp=sharing https://drive.google.com/file/d/1J6c6rQJwtDp1moAGlvQyLQXTqcuG4HjL/view?usp=sharing https://drive.google.com/file/d/1-R2KzsABC471AEjkF5qTm5O3V47

Automatic backpressure detection

2021-04-05 Thread Lu Niu
Hi, Flink dev Lately, we want to develop some tools to: 1. show backpressure operator without manual operation 2. Provide suggestions to mitigate back pressure after checking data skew, external service RPC etc. 3. Show back pressure history Could anyone share their experience with such tooling?

Re: Automatic backpressure detection

2021-04-06 Thread Lu Niu
K-14712 > [2] > > https://issues.apache.org/jira/browse/FLINK-14814?focusedCommentId=17256926&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17256926 > [3] > > http://mail-archives.apache.org/mod_mbox/flink-user/202104.mbox/%3c1d2412ce-d4d0-ed50-6181-1b610e16d...@ap

Re: Zigzag shape in TM JVM used memory

2021-04-07 Thread Lu Niu
mp;tbm=isch > > Piotrek > > > pon., 5 kwi 2021 o 22:54 Lu Niu napisał(a): > > > Hi, > > > > we need to update our email system then :) . Here are the links: > > > > > > > https://drive.google.com/file/d/1lZ5_P

Re: Automatic backpressure detection

2021-04-12 Thread Lu Niu
. > > About back porting the patches, if you want to create a custom Flink build > it should be do-able. There will be some conflicts for sure, so you will > need to understand Flink's code. > > Best, > Piotrek > > śr., 7 kwi 2021 o 02:32 Lu Niu napisał(a): > > &

Re: Automatic backpressure detection

2021-04-13 Thread Lu Niu
; pon., 12 kwi 2021 o 23:17 Lu Niu napisał(a): > > > Hi, Piotr > > > > Thanks for your detailed reply! It is mentioned here we cannot observe > > backpressure generated from AsyncOperator in Flink UI in 1.9.1. Is it > > fixed in the latest version? Thank you! > &

Add jobId and JobName variable to JobManager metrics in per-job deployment mode

2021-04-13 Thread Lu Niu
Hi, Flink dev Could you share your thoughts about https://issues.apache.org/jira/browse/FLINK-22164 ? context: We expose all flink metrics to an external system for monitoring and alerting. However, JobManager metrics only have one variable , which is not enough to target to one job when job is d

Flink TM Heartbeat Timeout

2021-06-10 Thread Lu Niu
Hi, Flink User Several of our applications get heartbeat timeout occasionally. there is no GC, no OOM: ``` - realtime conversion event filter (49/120) (16e3d3e7608eed0a30e0508eb52065fd) switched from RUNNING to FAILED on container_e05_1599158866703_129001_01_000111 @ xenon-pii-prod-001-20191210-da

Re: Job Recovery Time on TM Lost

2021-06-30 Thread Lu Niu
Thanks Gen! cc flink-dev to collect more inputs. Best Lu On Wed, Jun 30, 2021 at 12:55 AM Gen Luo wrote: > I'm also wondering here. > > In my opinion, it's because the JM can not confirm whether the TM is lost > or it's a temporary network trouble and will recover soon, since I can see > in the

Re: Job Recovery Time on TM Lost

2021-07-01 Thread Lu Niu
g the cancellation operation: Flink currently does not listen >>> to the dead letters of Akka. This means that the `akka.ask.timeout` is the >>> primary means to fail the future result of a rpc which could not be sent. >>> This is also an improvement we should add to Flink&#

Re: Job Recovery Time on TM Lost

2021-07-01 Thread Lu Niu
ntime.executiongraph.ExecutionGraph(time when all tasks switch from CREATED to RUNNING) ``` Best Lu On Thu, Jul 1, 2021 at 12:06 PM Lu Niu wrote: > Thanks TIll and Yang for help! Also Thanks Till for a quick fix! > > I did another test yesterday. In this test, I intentionall

Re: Job Recovery Time on TM Lost

2021-07-01 Thread Lu Niu
Another side question, Shall we add metric to cover the complete restarting time (phase 1 + phase 2)? Current metric jm.restartingTime only covers phase 1. Thanks! Best Lu On Thu, Jul 1, 2021 at 12:09 PM Lu Niu wrote: > Thanks TIll and Yang for help! Also Thanks Till for a quick fix! > &

Re: Job Recovery Time on TM Lost

2021-07-08 Thread Lu Niu
tbeat timeout) so that the user can configure it for her >> >>>>> environment. On the upside, if you mark the TaskExecutor dead on >> the first >> >>>>> connection loss (assuming you have a stable network environment), >> then it >> &

[jira] [Created] (FLINK-33804) Add Option to disable showing metrics in JobMananger UI

2023-12-12 Thread Lu Niu (Jira)
Lu Niu created FLINK-33804: -- Summary: Add Option to disable showing metrics in JobMananger UI Key: FLINK-33804 URL: https://issues.apache.org/jira/browse/FLINK-33804 Project: Flink Issue Type

[jira] [Created] (FLINK-33806) Async IO Allows Custom Action after Final Retry Failure

2023-12-12 Thread Lu Niu (Jira)
Lu Niu created FLINK-33806: -- Summary: Async IO Allows Custom Action after Final Retry Failure Key: FLINK-33806 URL: https://issues.apache.org/jira/browse/FLINK-33806 Project: Flink Issue Type

[jira] [Created] (FLINK-16640) Expose listStatus latency in flink filesystem

2020-03-17 Thread Lu Niu (Jira)
Lu Niu created FLINK-16640: -- Summary: Expose listStatus latency in flink filesystem Key: FLINK-16640 URL: https://issues.apache.org/jira/browse/FLINK-16640 Project: Flink Issue Type: Improvement

[jira] [Created] (FLINK-16931) Large _metadata file lead to JobManager not responding when restart

2020-04-01 Thread Lu Niu (Jira)
Lu Niu created FLINK-16931: -- Summary: Large _metadata file lead to JobManager not responding when restart Key: FLINK-16931 URL: https://issues.apache.org/jira/browse/FLINK-16931 Project: Flink

[jira] [Created] (FLINK-17089) Checkpoint fail because RocksDBException: Error While opening a file for sequentially reading

2020-04-10 Thread Lu Niu (Jira)
Lu Niu created FLINK-17089: -- Summary: Checkpoint fail because RocksDBException: Error While opening a file for sequentially reading Key: FLINK-17089 URL: https://issues.apache.org/jira/browse/FLINK-17089

[jira] [Created] (FLINK-17364) Support StreamingFileSink in PrestoS3FileSystem

2020-04-23 Thread Lu Niu (Jira)
Lu Niu created FLINK-17364: -- Summary: Support StreamingFileSink in PrestoS3FileSystem Key: FLINK-17364 URL: https://issues.apache.org/jira/browse/FLINK-17364 Project: Flink Issue Type: Improvement

[jira] [Created] (FLINK-19985) job went into zombie state after ZK session timeout

2020-11-04 Thread Lu Niu (Jira)
Lu Niu created FLINK-19985: -- Summary: job went into zombie state after ZK session timeout Key: FLINK-19985 URL: https://issues.apache.org/jira/browse/FLINK-19985 Project: Flink Issue Type: Bug

[jira] [Created] (FLINK-21263) Job hangs under backpressure

2021-02-03 Thread Lu Niu (Jira)
Lu Niu created FLINK-21263: -- Summary: Job hangs under backpressure Key: FLINK-21263 URL: https://issues.apache.org/jira/browse/FLINK-21263 Project: Flink Issue Type: Bug Affects Versions

[jira] [Created] (FLINK-22162) Make Max Operator name Length Configurable

2021-04-08 Thread Lu Niu (Jira)
Lu Niu created FLINK-22162: -- Summary: Make Max Operator name Length Configurable Key: FLINK-22162 URL: https://issues.apache.org/jira/browse/FLINK-22162 Project: Flink Issue Type: Improvement

[jira] [Created] (FLINK-22164) Add jobId and JobName variable to JobManager metrics in per-job deployment mode

2021-04-08 Thread Lu Niu (Jira)
Lu Niu created FLINK-22164: -- Summary: Add jobId and JobName variable to JobManager metrics in per-job deployment mode Key: FLINK-22164 URL: https://issues.apache.org/jira/browse/FLINK-22164 Project: Flink

[jira] [Created] (FLINK-22326) Job contains Iterate Operator always fails on Checkpoint

2021-04-16 Thread Lu Niu (Jira)
Lu Niu created FLINK-22326: -- Summary: Job contains Iterate Operator always fails on Checkpoint Key: FLINK-22326 URL: https://issues.apache.org/jira/browse/FLINK-22326 Project: Flink Issue Type

[jira] [Created] (FLINK-36582) sql client hang when querying a parquet table

2024-10-22 Thread Lu Niu (Jira)
Lu Niu created FLINK-36582: -- Summary: sql client hang when querying a parquet table Key: FLINK-36582 URL: https://issues.apache.org/jira/browse/FLINK-36582 Project: Flink Issue Type: Improvement