Re: Kafka data sources, multiple interval joins and backfilling

2021-07-20 Thread JING ZHANG
Hi Dan, > I've tried playing around with parallelism and resources. It does help. Glad to hear your problem is solved 😀. > Does Flink have special logic with the built in interval join code that impacts how kafka data sources are read? No. If you said the way I mentioned in the last email, I mean

Re: Kafka data sources, multiple interval joins and backfilling

2021-07-20 Thread Dan Hill
Thanks JING and Caizhi! Yea, I've tried playing around with parallelism and resources. It does help. We have our own join operator that acts like an interval join (with fuzzy matching). We wrote our own KeyedCoProcessFunction and modeled it closely after the internal interval join code. Does F

Re: Set job specific resources in one StreamTableEnvironment

2021-07-20 Thread Paul Lam
Hi Yun, Thanks a lot for your reply! Regarding the parallelism, I think `table.exec.resource.default-parallelism` that you mentioned is a good alternative, but it requires a set operation before running each query. And since it’s a `default` value, I suppose there should be an option with highe

Re: [DISCUSS] FLIP-185: Shorter heartbeat timeout and interval default values

2021-07-20 Thread Yang Wang
Thanks @Till Rohrmann for starting this discussion Firstly, I try to understand the benefit of shorter heartbeat timeout. IIUC, it will make the JobManager aware of TaskManager faster. However, it seems that only the standalone cluster could benefit from this. For Yarn and native Kubernetes depl

Re: connect to schema registry kafka SSL with flink 1.11.2

2021-07-20 Thread Caizhi Weng
Hi! You'll need to set props.put("schema.registry.basic.auth.user.info", ":"); tkg_cangkul 于2021年7月21日周三 上午12:06写道: > Hi, > > > i'm trying to connect to kafka with schema registry that using SSL with > flink 1.11.2 > > i've got this error message when i try to submit the job. > > Caused by: io

Re: Kafka data sources, multiple interval joins and backfilling

2021-07-20 Thread Caizhi Weng
Hi! Streaming joins will not throw away records in the state unless it exceeds the TTL. Have you tried increasing the parallelism of join operators (and maybe decrease the parallelism of the large Kafka source)? Dan Hill 于2021年7月21日周三 上午4:19写道: > Hi. My team's flink job has cascading interval

Re: Can we share state between different keys in the same window?

2021-07-20 Thread Caizhi Weng
Hi! For this use case it seems that we'll either have to use a custom connector for that external database (if currently there is no such connector in Flink), or have to first read the data into some other sources supported by Flink. Sweta Kalakuntla 于2021年7月21日周三 上午10:39写道: > That is my unders

Re: Can we share state between different keys in the same window?

2021-07-20 Thread Sweta Kalakuntla
That is my understanding as well, but wanted to confirm. My use case is I need to get additional data from an external database(not Source) for the data I am reading off of Kafka Source and aggregate data for a 10min window for a given key. But I did not want to query for every key in that 10min w

Re: Failure running Flink locally with flink-s3-fs-hadoop + AWS SDK v2 as a dependency

2021-07-20 Thread Yaroslav Tkachenko
Hi, sorry for resurrecting the old thread, but I have precisely the same issue with a different filesystem. I've tried using plugins dir, setting FLINK_PLUGINS_DIR, etc. - nothing works locally. I added a breakpoint to the PluginConfig.getPluginsDir method and confirmed it's not even called.

Kafka data sources, multiple interval joins and backfilling

2021-07-20 Thread Dan Hill
Hi. My team's flink job has cascading interval joins. The problem I'm outlining below is fine when streaming normally. It's an issue with backfills. We've been running into a bunch of backfills to evaluate the job over older data. When running as backfills, I've noticed that sometimes one of d

Re: Watermark UI after checkpoint failure

2021-07-20 Thread Dan Hill
It's after a checkpoint failure. I don't know if that includes a restore from a checkpoint. I'll take some screenshots when the jobs hit the failure again. All of my currently running jobs are healthy right now and haven't hit a checkpoint failure. On Sun, Jul 18, 2021 at 11:34 PM Dawid Wysakow

connect to schema registry kafka SSL with flink 1.11.2

2021-07-20 Thread tkg_cangkul
Hi, i'm trying to connect to kafka with schema registry that using SSL with flink 1.11.2 i've got this error message when i try to submit the job. Causedby: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Unauthorized; error code: 401 Anyone can help me to

Re: OOM Metaspace after multiple jobs

2021-07-20 Thread Robert Metzger
Hi Alexis, I hope I'm not stating the obvious, but have you checked this documentation page: https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/debugging/debugging_classloading/#unloading-of-dynamically-loaded-classes-in-user-code In particular the shutdown hooks we've introduced in F

Re: [DISCUSS] FLIP-185: Shorter heartbeat timeout and interval default values

2021-07-20 Thread Robert Metzger
+1 to this change! When I was working on the reactive mode blog post [1] I also ran into this issue, leading to a poor "out of the box" experience when scaling down. For my experiments, I've chosen a timeout of 8 seconds, and the cluster has been running for 76 days (so far) on Kubernetes. I also

Re: Stateful Functions Status

2021-07-20 Thread Omid Bakhshandeh
Igal, Thanks for the answers. Is there any JS SDK available? Best, --Omid On Tue, Jul 20, 2021 at 10:23 AM Igal Shilman wrote: > Hi Omid, > > I'm glad to hear that you are evaluating StateFun in your company! let me > try to answer your questions: > > 1. In version 2.x, StateFun only supported

Re: Stateful Functions Status

2021-07-20 Thread Igal Shilman
Hi Omid, I'm glad to hear that you are evaluating StateFun in your company! let me try to answer your questions: 1. In version 2.x, StateFun only supported messages of type com.google.protobuf.Any, and we had a tiny optimization that reads type hints and unpacked the real message out of the Any m

Re: Some question of RocksDB state backend on ARM os

2021-07-20 Thread Robert Metzger
I guess this is unlikely, because nobody is working on the mentioned tickets. I mentioned your request in the ticket, to raise awareness again, but I would still consider it unlikely. On Tue, Jul 20, 2021 at 1:57 PM Wanghui (HiCampus) wrote: > Hi Robert: > > Thank you for your reply. >

Re: Topic assignment across Flink Kafka Consumer

2021-07-20 Thread Robert Metzger
Hi Prasanna, which Flink version and Kafka connector are you using? (the "KafkaSource" or "FlinkKafkaConsumer"?) The partition assignment for the FlinkKafkaConsumer is defined here: https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/st

RE: Some question of RocksDB state backend on ARM os

2021-07-20 Thread Wanghui (HiCampus)
Hi Robert: Thank you for your reply. Will this feature be released in version 1.14? Best, Hui 发件人: Robert Metzger [mailto:rmetz...@apache.org] 发送时间: 2021年7月20日 19:45 收件人: Wanghui (HiCampus) 抄送: user@flink.apache.org 主题: Re: Some question of RocksDB state backend on ARM os The RocksDB ver

Re: Some question of RocksDB state backend on ARM os

2021-07-20 Thread Robert Metzger
The RocksDB version provided by Flink does not currently run on ARM. However, there are some efforts / hints: - https://stackoverflow.com/a/44573013/568695 - https://issues.apache.org/jira/browse/FLINK-13448 - https://issues.apache.org/jira/browse/FLINK-13598 I would recommend voting and commenti

Re: Flink RocksDB Performance

2021-07-20 Thread Robert Metzger
Your understanding of the problem is correct -- the serialization cost is the reason for the high CPU usage. What you can also try to optimize is the serializers you are using (by using data types that are efficient to serialize). See also this blog post: https://flink.apache.org/news/2020/04/15/f

Re: Subpar performance of temporal joins with RocksDB backend

2021-07-20 Thread Robert Metzger
Are you using remote disks for rocksdb? (I guess that's EBS on AWS) Afaik there are usually limitations wrt to the IOPS you can perform. I would generally recommend measuring where the bottleneck is coming from. It could be that your CPUs are at 100%, then adding more machines / cores will help (m