Re: Flink CDC operational questions

2024-12-05 Thread Xiqian YU
-guide/understand-flink-cdc-api/ [2] https://www.ververica.com/platform [3] https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/cli/ Regards, Xiqian De : Robin Moffatt via user Date : vendredi, 6 décembre 2024 à 02:42 À : user@flink.apache.org Objet : Flink CDC operational questions

Flink CDC operational questions

2024-12-05 Thread Robin Moffatt via user
Are there any resources I can look at regarding the operational side of Flink CDC? e.g. is the Flink web UI the only place to monitor what's going on, or looking through the log files? Can you stop and restart a job or will it trigger a fresh snapshot? etc thanks!

Re: Questions about using Flink MongoDB CDC

2024-11-17 Thread Jiabao Sun
Hi, The Flink SQL Planner uses the ChangelogNormalize operator to cache all incoming data for upsert type Changelog in order to complete the pre-image values, which results in additional state overhead. When the MongoDB version is below 6.0, the oplog does not contain Pre-Images of changed record

Questions about using Flink MongoDB CDC

2024-11-16 Thread wangye...@yeah.net
Hi all: While using Flink with MongoDB CDC, I've noticed that my Flink job causes MongoDB's memory usage to continuously increase. Below, I will detail the specific scenario to help identify where the issue lies. 1. MongoDB deployment architecture: sharded. 2. The memory usage of the

Flink Kubernetes Operator questions

2024-10-28 Thread louis lau
Hi all, we are trying to use flink kubernetes operator 1.9.0 and flink 1.19.1. now I have some questions about how to use it: 1. How can I confirm when the flink job need manual recovery from the flinkdeployment status info ? 2. what's the best logic to check the job status ? I want to aler

Re:May I ask some questions about Flink ListView ?

2024-09-22 Thread Xuyang
Hi, Apollo. The processing of agg is generated by codegen. You can observe the processing log of the generated agg class by setting the following parameters: env.java.opts.taskmanager: "-Dorg.codehaus.janino.source_debugging.enable=true -Dorg.codehaus.janino.source_debugging.dir=/flink/log/"

May I ask some questions about Flink ListView ?

2024-09-20 Thread Apollo Elon
When I tried to calculate the median of the age field in a table with 120 million rows, I implemented a custom UDAF. However, there was a significant performance difference between the two different types of accumulator. The ListView stated that it would enable the state backend when encountering

Re: Questions about the client synchronously obtaining task execution results

2023-11-19 Thread 刘峻池
Sorry I forgot to add the version information, the version is 1.17 刘峻池 于2023年11月20日周一 13:59写道: > Hi Flink Community > > When I run this command `flink run-application -t yarn-application -sae > mainClass somejar` to submit some batch-task on YARN with Application > Mode, my shell client always

Questions about the client synchronously obtaining task execution results

2023-11-19 Thread 刘峻池
Hi Flink Community When I run this command `flink run-application -t yarn-application -sae mainClass somejar` to submit some batch-task on YARN with Application Mode, my shell client always terminates after task submission success, then the dispatcher cannot receive the client heartbeat for a lon

Scalability Questions Concerning Apache Flink Operator 1.17

2023-10-24 Thread Vince Castello via user
I have been working with the 1.17 version of the Apache Flink operator and have below questions. As part of upgrading the application, is the application suspended, i.e. it is checkpointed, prior to doing an upgrade? Is there a concept of a hot update where I can update the application while it

Default Flink S3 FileSource Questions

2023-09-25 Thread Varun Narayanan Chakravarthy via user
Hello Flink Community, Flink Version: 1.16.1, Zookeeper for HA. My Flink Applications reads raw parquet files hosted in S3, applies transformations and re-writes them to S3, under a different location. Below is my code to read from parquets from S3: ``` final Configuration configuration = new C

Re: Questions related to Autoscaler

2023-08-11 Thread liu ron
his > threashold, autoscaler will prevent any downscaling behavior. > > > Best, > Zhanghao Chen > -- > *发件人:* Hou, Lijuan via user > *发送时间:* 2023年8月9日 3:04 > *收件人:* user@flink.apache.org > *主题:* Questions related to Autoscaler > >

回复: Questions related to Autoscaler

2023-08-10 Thread Chen Zhanghao
8月9日 3:04 收件人: user@flink.apache.org 主题: Questions related to Autoscaler Hi Flink team, This is Lijuan. I am working on our flink job to realize autoscaling. We are currently using flink version of 1.16.1, and using flink operator version of 1.5.0. I have some questions need to confirm wi

Questions related to Autoscaler

2023-08-10 Thread Hou, Lijuan via user
Hi Ron, Thanks for the reply! > 1 - It seems for flink job using flink operator to realize autoscaling, the > only option to realize autoscaling is to enable the Autoscaler feature, and > KEDA won’t work, right? What is KEDA mean? -> KEDA is a Kubernetes based Event Driven Autoscaler. I found

Re: Questions related to Autoscaler

2023-08-08 Thread liu ron
1.16.1, and using flink operator > version of 1.5.0. I have some questions need to confirm with you. > > > > 1 - It seems for flink job using flink operator to realize autoscaling, > the only option to realize autoscaling is to enable the Autoscaler feature, > and KEDA won’t work, r

Questions related to Autoscaler

2023-08-08 Thread Hou, Lijuan via user
Hi Flink team, This is Lijuan. I am working on our flink job to realize autoscaling. We are currently using flink version of 1.16.1, and using flink operator version of 1.5.0. I have some questions need to confirm with you. 1 - It seems for flink job using flink operator to realize autoscaling

RE: Questions about java enum when convert DataStream to Table

2023-08-02 Thread Jiabao Sun
Hi haishui, The enum type cannot be mapped as flink table type directly. I think the easiest way is to convert enum to string type first: DataStreamSource> source = env.fromElements( new Tuple2<>("1", TestEnum.A.name()), new Tuple2<>("2", TestEnum.B.name()) ); Or add a map trans

Questions about java enum when convert DataStream to Table

2023-08-01 Thread haishui
I want to convert dataStream to Table. The type of dataSream is a POJO, which contains a enum field. 1. The enum field is RAW('classname', '...') in table. When I execute `SELECT * FROM t_test` and print the result, It throws EOFException. 2. If I assign the field is STRING in schema, It throws

Re: Questions on Restarting a Flink Application from a savepoint or checkpoint

2023-07-21 Thread Gyula Fóra
ttps://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/job-management/#stateful-and-stateless-application-upgrades >>>>>> >>>>>> The operator is made especially to handle stateful application >>>>>> upgra

Re: Questions on Restarting a Flink Application from a savepoint or checkpoint

2023-07-21 Thread Tony Chen
de` setting for jobs, as >>>>> long as you have last-state or savepoint you will always get the latest >>>>> state. >>>>> >>>>> This is somewhat orthogonal to the savepoint trigger / >>>>> initialSavepointPath mechanisms. The i

Re: Questions on Restarting a Flink Application from a savepoint or checkpoint

2023-07-19 Thread Gyula Fóra
>>> operator is not aware of the latest state. After that all upgrades always >>>> use the latest state unless the upgradeMode is stateless in which case no >>>> state is used. Savepoint triggering can help you keep backups for failure >>>> recovery but t

Re: Questions on Restarting a Flink Application from a savepoint or checkpoint

2023-07-19 Thread Tony Chen
> operator is not aware of the latest state. After that all upgrades always >>> use the latest state unless the upgradeMode is stateless in which case no >>> state is used. Savepoint triggering can help you keep backups for failure >>> recovery but they should

Re: Questions on Restarting a Flink Application from a savepoint or checkpoint

2023-07-19 Thread Gyula Fóra
ing can help you keep backups for failure >> recovery but they should not be executed as part of your upgrade flow >> because the operator already does this for you. >> >> Cheers, >> Gyula >> >> On Wed, Jul 19, 2023 at 8:20 PM Tony Chen >> wrote: >&g

Re: Questions on Restarting a Flink Application from a savepoint or checkpoint

2023-07-19 Thread Tony Chen
link Community, >> >> My name is Tony Chen, and I am a software engineer at Robinhood. I have >> some questions on restarting a Flink Application from a savepoint or >> checkpoint. >> >> We currently store our checkpoints and savepoints in S3, and we would >>

Re: Questions on Restarting a Flink Application from a savepoint or checkpoint

2023-07-19 Thread Gyula Fóra
Chen, and I am a software engineer at Robinhood. I have > some questions on restarting a Flink Application from a savepoint or > checkpoint. > > We currently store our checkpoints and savepoints in S3, and we would like > to use the Apache Flink Kubernetes Operator to manage our Flin

Questions on Restarting a Flink Application from a savepoint or checkpoint

2023-07-19 Thread Tony Chen
Hi Flink Community, My name is Tony Chen, and I am a software engineer at Robinhood. I have some questions on restarting a Flink Application from a savepoint or checkpoint. We currently store our checkpoints and savepoints in S3, and we would like to use the Apache Flink Kubernetes Operator to

Re: 回复: Questions regarding adaptive scheduler with YARN and application mode

2023-06-28 Thread Leon Xu
wiki.apache.org/confluence/display/FLINK/FLIP-291%3A+Externalized+Declarative+Resource+Management> > [3] Autoscaler | Apache Flink Kubernetes Operator > <https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.5/docs/custom-resource/autoscaler/> > > Best, > Zha

Re: 回复: Questions regarding adaptive scheduler with YARN and application mode

2023-06-28 Thread Madan D via user
n[3] Autoscaler | Apache Flink Kubernetes Operator Best,Zhanghao Chen发件人: Leon Xu 发送时间: 2023年6月27日 13:41 收件人: user 主题: Questions regarding adaptive scheduler with YARN and application mode Hi Flink users, I am trying to use Adaptive Scheduler to auto scale our Flink streaming jobs (NOT batch

回复: Questions regarding adaptive scheduler with YARN and application mode

2023-06-28 Thread Chen Zhanghao
est, Zhanghao Chen 发件人: Leon Xu 发送时间: 2023年6月27日 13:41 收件人: user 主题: Questions regarding adaptive scheduler with YARN and application mode Hi Flink users, I am trying to use Adaptive Scheduler to auto scale our Flink streaming jobs (NOT batch job). Our jobs are running on YARN wit

Questions regarding adaptive scheduler with YARN and application mode

2023-06-26 Thread Leon Xu
Hi Flink users, I am trying to use Adaptive Scheduler to auto scale our Flink streaming jobs (NOT batch job). Our jobs are running on YARN with application mode. There isn't much doc around how adaptive scheduler works. So I have some questions: 1. How does Adaptive Scheduler work with

Re: Questions on S3 File Sink Behavior

2023-03-29 Thread Mate Czagany
Hi, 1. In case of S3 FileSystem, Flink uses the multipart upload process [1] for better performance. It might not be obvious at first by looking at the docs, but it's noted at the bottom of the FileSystem page [2] For more information you can also check FLINK-9751 and FLINK-9752 2. In case of loc

Questions on S3 File Sink Behavior

2023-03-29 Thread Chirag Dewan via user
Hi,   We are tying to use Flink's File sink to distribute files to AWS S3 storage. We are using Flink provided Hadoop s3a connector as plugin. We have some observations that we needed to clarify: 1. When using file sink for local filesystem distribution, we can see that the sink creates 3 se

Re: I want to subscribe users' questions

2023-02-07 Thread yuxia
ser-zh" 发送时间: 星期五, 2023年 2 月 03日 下午 7:48:55 主题: I want to subscribe users' questions Hi, My name is Guanyuan Chen.I am a big data development engineer, tencent wechat department, china. I have 4 years experience in flink developing, and want to subscribe flink's development ne

Re: I want to subscribe users' questions

2023-02-07 Thread Hang Ruan
Hi, guanyuan, This document( https://flink.apache.org/community.html#how-to-subscribe-to-a-mailing-list) will be helpful. welcome~ Best, Hang guanyuan chen 于2023年2月7日周二 21:37写道: > Hi, > My name is Guanyuan Chen.I am a big data development engineer, tencent > wechat department, china. I hav

questions about FLINK-27341

2022-09-03 Thread yidan zhao
Hi, I want to know is there some way to avoid this problem now? I can not guarantee jobmanager and taskmanager do not run in the same machine.

Questions regarding JobManagerWatermarkTracker on AWS Kinesis

2022-07-25 Thread Peter Schrott
Hi there! I have a Flink Job (v 1.13.2, AWS managed) which reads from Kinesis (AWS manger, 4 shards). For reasons the shards are not partitioned properly (at the moment). So I wanted to make use of Watermarks (BoundedOutOfOrdernessTimestampExtractor) and the JobManagerWatermarkTracker to avoid

Re: Questions regarding classpath loading order in YarnClusterDescriptor

2022-06-05 Thread Geng Biao
o Geng From: Leon Xu Date: Sunday, June 5, 2022 at 4:04 PM To: Biao Geng Cc: user Subject: Re: Questions regarding classpath loading order in YarnClusterDescriptor Hi Biao, I really appreciate your thorough answers. And yes for now I took the workaround by manipulating the directory names. To

Re: Questions regarding classpath loading order in YarnClusterDescriptor

2022-06-05 Thread Leon Xu
f *the >> yarn.provided.lib.dirs >> *property under the yarn configuration. >> >> By playing with the YarnClusterDescriptor code I have two questions that >> I hope to get some answers: >> 1. YarnClusterDescriptor seems to force the classpath loading in >> a

Re: Questions regarding classpath loading order in YarnClusterDescriptor

2022-06-04 Thread Biao Geng
ion from Java code to YARN cluster, in the > application mode. We are setting the classpath as the value of *the > yarn.provided.lib.dirs > *property under the yarn configuration. > > By playing with the YarnClusterDescriptor code I have two questions that I > hope to get some

Questions regarding classpath loading order in YarnClusterDescriptor

2022-06-04 Thread Leon Xu
playing with the YarnClusterDescriptor code I have two questions that I hope to get some answers: 1. YarnClusterDescriptor seems to force the classpath loading in alphabetical order. See code here <https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/y

Questions about Flink Stateful Functions Current Capabilities

2022-05-28 Thread Ryan van Huuksloot
DataStream interop). Below are my questions. I want to make sure I understand StateFuns and the core functionality - and potentially some of the future roadmap. 1. KeyBy: From the looks of it, ingress / egress can be scaled, but how does the scaling of the functions work? If those functions have state

Re: Savepoint and cancel questions

2022-04-23 Thread Dan Hill
Hi Hangxiang. Thanks! 1. Ah, okay. It makes more sense considering FAILED. 2. Oh cool. I'm migrating to v1.14.4 now. 3. Yes, this is great! On Fri, Apr 22, 2022 at 8:05 PM Hangxiang Yu wrote: > Hi, Dan > 1. Do you mean put the option into savepoint command? If so, I think it > will not work

Re: Savepoint and cancel questions

2022-04-22 Thread Hangxiang Yu
Hi, Dan 1. Do you mean put the option into savepoint command? If so, I think it will not work well. This option describe that how checkpoints will be cleaned up in different job status. e.g. FAILED/CANCELED. It cannot be covered in savepoint command. 2. Which flink version you use? I work on 1.14 a

Savepoint and cancel questions

2022-04-22 Thread Dan Hill
Hi. 1. Why isn’t –externalizedCheckpointCleanup an option on savepoint (instead of being needed at the start of a job run)? 2. Can we get a confirmation dialog when someone clicks "cancel job" in the UI? Just in case people click on accident. 3. Can we get a way to have Flink clean up the previ

Apache StateFun - A few questions about of module.yaml

2022-04-12 Thread M Singh
: io.statefun.playground.v1/ingressspec:  port: 8090---kind: io.statefun.playground.v1/egressspec:  port: 8091  topics:    - greetings Questions: 1. How does the ingress component (kind:io.statefun.playground.v1/ingress) get initialized to listen to port 8090 and same for egress (kind

Re: Questions regarding connecting local FlinkSQL client to remote JobManager in K8s

2022-03-03 Thread yu'an huang
Hi Elkhan, I confirm that the FlinkSQL Client is communicating with JM via Rest endpoint. After I changed the “rest.port”, the sql client thrown exception: "[ERROR] Could not execute SQL statement. Reason: java.net.ConnectException: Connection refused”. So for your case, since Flink will creat

Re: Questions regarding connecting local FlinkSQL client to remote JobManager in K8s

2022-03-03 Thread yu'an huang
Hi Elkhan, Except for JM have an external IP address, I think the port 6123 also need to be opened. You may need to set a host port for 6123 in JM pod or expose this port by Kubernetes service. But I am not sure whether the sql-client communicate with JM via Rest endpoint or RPC port. Hopes som

Questions regarding connecting local FlinkSQL client to remote JobManager in K8s

2022-03-02 Thread Elkhan Dadashov
Hi Flink users, Wanted to check if any of you tried to run the local FlinkSQL client against JobManager running in the Kubernetes environment. For local FlinkSQL Client and local Flink cluster we set these params: jobmanager.rpc.address: localhost jobmanager.rpc.port: 6123 To make it work, Is t

RE: Basic questions about resuming stateful Flink jobs

2022-02-17 Thread Schwalbe Matthias
method look like this (for Flink 1.13.0, and should be similar for other releases): [1] Feel free to get back with additional questions 😊 Thias [1] remodeled execute(…) (scala): def execute(jobName: String): JobExecutionResult = { if (fromSavepoint != null

Re: Basic questions about resuming stateful Flink jobs

2022-02-17 Thread Piotr Nowojski
htlies.apache.org/flink/flink-docs-release-1.14/docs/ops/state/task_failure_recovery/ > [3] > https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/ops/state/savepoints/#configuration > > > On Wed, Feb 16, 2022 at 12:28 PM Sandys-Lumsdaine, James < > james.sandys

Re: Basic questions about resuming stateful Flink jobs

2022-02-16 Thread Cristian Constantinescu
.com> wrote: > Thanks for your reply, Piotr. > > > > Some follow on questions: > > >". Nevertheless you might consider enabling them as this allows you to > manually cancel the job if it enters an endless recovery/failure loop, fix > the underlying issue, and restart

RE: Basic questions about resuming stateful Flink jobs

2022-02-16 Thread Sandys-Lumsdaine, James
Thanks for your reply, Piotr. Some follow on questions: >". Nevertheless you might consider enabling them as this allows you to >manually cancel the job if it enters an endless recovery/failure loop, fix the >underlying issue, and restart the job from the externalised checkpoint

Re: Basic questions about resuming stateful Flink jobs

2022-02-16 Thread Piotr Nowojski
Hi James, Sure! The basic idea of checkpoints is that they are fully owned by the running job and used for failure recovery. Thus by default if you stopped the job, checkpoints are being removed. If you want to stop a job and then later resume working from the same point that it has previously sto

Basic questions about resuming stateful Flink jobs

2022-02-16 Thread James Sandys-Lumsdaine
Hi all, I have a 1.14 Flink streaming workflow with many stateful functions that has a FsStateBackend and checkpointed enabled, although I haven't set a location for the checkpointed state. I've really struggled to understand how I can stop my Flink job and restart it and ensure it carries off

Re: Questions about Kryo setRegistrationRequired(false)

2022-02-08 Thread Shane Bishop
type is registered with Kryo or not. [1] https://issues.apache.org/jira/browse/FLINK-25993 Best regards, Shane From: Chesnay Schepler Sent: February 7, 2022 3:08 AM To: Shane Bishop ; user@flink.apache.org Subject: Re: Questions about Kryo

Re: Questions about Kryo setRegistrationRequired(false)

2022-02-07 Thread Shane Bishop
Sent: February 7, 2022 3:08 AM To: Shane Bishop ; user@flink.apache.org Subject: Re: Questions about Kryo setRegistrationRequired(false) There isn't any setting to control setRegistrationRequired(). You can however turn Kryo off via ExecutionConfig#disableGenericTypes, although thi

Re: Questions about Kryo setRegistrationRequired(false)

2022-02-07 Thread Chesnay Schepler
There isn't any setting to control setRegistrationRequired(). You can however turn Kryo off via ExecutionConfig#disableGenericTypes, although this may require changes to your data types. I'd recommend to file a ticket. On 04/02/2022 20:12, Shane Bishop wrote: Hi all, TL;DR: I am concerned t

Re: Questions about checkpoint retention

2022-02-05 Thread 陳昌倬
On Fri, Jan 28, 2022 at 02:43:11PM +0800, Caizhi Weng wrote: > Chen-Che Huang 于2022年1月27日周四 11:10写道: > > We have two questions for checkpoint retention. > > > >1. When our cron job creates a savepoint called SP, it seems those > >checkpoints created earlier SP

Questions about Kryo setRegistrationRequired(false)

2022-02-04 Thread Shane Bishop
Hi all, TL;DR: I am concerned that kryo.setRegistrationRequired(false) in Apache Flink might introduce serialization/deserialization vulnerabilities, and I want to better understand the security implications of its use in Flink. There is an issue on the Kryo GitHub repo (link

Re: Questions about checkpoint retention

2022-01-27 Thread Caizhi Weng
y creates > new checkpoints regularly and keeps only the latest 10 > > checkpoints. Besides, for app upgrade and better reliability, we have a > cron job which creates savepoints at regular intervals. > > > > We have two questions for checkpoint retention. > >

Questions about checkpoint retention

2022-01-26 Thread Chen-Che Huang
upgrade and better reliability, we have a cron job which creates savepoints at regular intervals. We have two questions for checkpoint retention. 1. When our cron job creates a savepoint called SP, it seems those checkpoints created earlier SP still cannot be deleted. We thought the new

Re: [Statefun] Questions on recovery

2021-11-18 Thread Hady Januar Willi
Hi Igal, Thank you for your response, understood the strategies. Best, Hady On Wed, Nov 3, 2021 at 9:06 PM Igal Shilman wrote: > Hello Hady, > Glad to see that you are testing StateFun! > > Regarding that exception, I think that this is not the root cause. The > root cause is as you wrote that

Re: Flink SQL build-in function questions.

2021-11-13 Thread Yuval Itzchakov
I recall looking for these once in the SQL standard spec, AFAIR they are not part of it. On Fri, Nov 12, 2021, 11:48 Francesco Guardiani wrote: > Yep I agree with waiting for calcite to support it. As a temporary > workaround you can define your own udfs with that functionality. > > I also wonde

Re: Flink SQL build-in function questions.

2021-11-12 Thread Francesco Guardiani
Yep I agree with waiting for calcite to support it. As a temporary workaround you can define your own udfs with that functionality. I also wonder, are the bitwise operators defined in the ansi sql specification? Or should we just follow the common sense behavior of databases supporting it? On Fri

Re: Flink SQL build-in function questions.

2021-11-12 Thread JIN FENG
Sure, I can take a try. Before starting the work, we should discuss the api of bit operation function. There are two alternatives 1. add some built in functions include bitAnd,bitNot,bitOr,bitXor 2. support &, |, ^, ~ operators in calcite first. Currently, there is a relative jira https://issues.a

Re: Flink SQL build-in function questions.

2021-11-11 Thread Martijn Visser
Hi, I don't think there's currently anyone in the community who is working on the bit operation functions. Would you be interested and able to make a contribution on that? Best regards, Martijn On Thu, 11 Nov 2021 at 03:54, JIN FENG wrote: > hi all, > I met two problems when I use FlinkSQL. >

Flink SQL build-in function questions.

2021-11-10 Thread JIN FENG
hi all, I met two problems when I use FlinkSQL. 1. Is there any plan to support bit operation functions ? Currently there is some jira mentioned about this, https://issues.apache.org/jira/browse/FLINK-14990 , https://issues.apache.org/jira/browse/FLINK-12451 But It seems that it hasn't been u

Re: [Statefun] Questions on recovery

2021-11-03 Thread Igal Shilman
Hello Hady, Glad to see that you are testing StateFun! Regarding that exception, I think that this is not the root cause. The root cause is as you wrote that the StateFun job failed because it wasn't able to deliver a message to a remote function in the given time frame. If you look at the logs yo

[Statefun] Questions on recovery

2021-11-03 Thread Hady Januar Willi
Hi everyone, When testing Flink statefun, the job eventually throws the following exception after failing to reach the endpoint or if the endpoint fails after the exponentially increasing delay. java.util.concurrent.RejectedExecutionException: org.apache.flink.streaming.runtime.tasks.mailbox.Task

Re: Questions about keyed streams

2021-09-28 Thread Dan Hill
Hi! I'm just getting back to this. Questions: 1. Across operators, does the same key group ids get mapped to the same task managers? E.g. if an item is in key group 1 of operator A and that runs on taskmanager-0, will key group 1 of operator B also run on taskmanager-0? 2. Are ther

Re: Questions regarding broadcast join in Flink

2021-09-10 Thread Timo Walther
Hi Gerald, actually, this is a typical issue when performing a streaming join. An ideal solution would be to block the main stream until the broadcast stream is ready. However, this is currently not supported in the API. In any case, a user needs to handle this in a use case specific way to

Questions regarding broadcast join in Flink

2021-09-09 Thread Gerald.Sula
Hello, I am trying to implement a broadcast join of two streams in flink using the broadcast functionality. In my usecase I have a large stream that will be enriched with a much smaller stream. In order to first test my approach, I have adapted the Taxi ride exercise in the official training rep

RE: 1.9 to 1.11 Managed Memory Migration Questions

2021-08-27 Thread Hailu, Andreas [Engineering]
Thanks Caizhi, this was very helpful. // ah From: Caizhi Weng Sent: Thursday, August 26, 2021 10:41 PM To: Hailu, Andreas [Engineering] Cc: user@flink.apache.org Subject: Re: 1.9 to 1.11 Managed Memory Migration Questions Hi! I've read the first mail again and discover that the direct m

Re: 1.9 to 1.11 Managed Memory Migration Questions

2021-08-26 Thread Caizhi Weng
jects/flink/flink-docs-release-1.10/ops/config.html#taskmanager-memory-task-off-heap-size > > [2] > https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_setup.html#configure-off-heap-memory-direct-or-native > > > > *// *ah > > > > *From:* Caiz

RE: 1.9 to 1.11 Managed Memory Migration Questions

2021-08-26 Thread Hailu, Andreas [Engineering]
-direct-or-native // ah From: Caizhi Weng Sent: Wednesday, August 25, 2021 10:47 PM To: Hailu, Andreas [Engineering] Cc: user@flink.apache.org Subject: Re: 1.9 to 1.11 Managed Memory Migration Questions Hi! Why does this ~30% memory reduction happen? I don't know how memory is calculated in

Re: 1.9 to 1.11 Managed Memory Migration Questions

2021-08-25 Thread Caizhi Weng
emory configurations options set aside from > ‘taskmanager.network.bounded-blocking-subpartition-type: file’ which I see > is now deprecated and replaced with a new option defaulted to ‘file’ (which > works for us!) SO nearly everything else is as default. > > > > We haven’

1.9 to 1.11 Managed Memory Migration Questions

2021-08-25 Thread Hailu, Andreas [Engineering]
still combing through the migration instructions, but I did have some questions around what I observed. 1. I observed that an application ran with "-ytm 12288" on 1.9 receives 8.47GB JVM Heap space and 5.95 Flink Managed Memory space (as reported by the ApplicationMaster), where o

Re: Questions on usage of SQL hints

2021-08-12 Thread JING ZHANG
Hi Paul, I'm very happy to hear that.😀, Paul Lam 于2021年8月12日周四 下午3:17写道: > Hi JING, > > Thanks for your inputs! It helps a lot. > > Best, > Paul Lam > > 2021年8月12日 13:13,JING ZHANG 写道: > > Hi Paul, > There are Table hints and Query hints. > Query hints are on the way, there is a JIRA to track t

Re: Questions on usage of SQL hints

2021-08-12 Thread Paul Lam
Hi JING, Thanks for your inputs! It helps a lot. Best, Paul Lam > 2021年8月12日 13:13,JING ZHANG 写道: > > Hi Paul, > There are Table hints and Query hints. > Query hints are on the way, there is a JIRA to track this issue [3]. AFAIK, > the issue is almost close to submit a pull request now. > >

Re: Questions on usage of SQL hints

2021-08-11 Thread JING ZHANG
Hi Paul, There are Table hints and Query hints. Query hints are on the way, there is a JIRA to track this issue [3]. AFAIK, the issue is almost close to submit a pull request now. Table hints[1][2] are already supported since Flink 1.11. You could find more detail information in [1][2]. For table

Questions on usage of SQL hints

2021-08-11 Thread Paul Lam
Hi community, I’m trying out SQL hints on DML, but there’s not very much about the supported SQL hints on the docs. Are the SQL hints limited to source/sink tables only at the moment? And where can I find the full list of supported SQL hints? Thanks in advance! Best, Paul Lam

Re: Questions on reading JDBC data with Flink Streaming API

2021-08-10 Thread Fuyao Li
Sandys-Lumsdaine Date: Tuesday, August 10, 2021 at 07:58 To: user@flink.apache.org Subject: [External] : Questions on reading JDBC data with Flink Streaming API Hello, I'm starting a new Flink application to allow my company to perform lots of reporting. We have an existing legacy system

Questions on reading JDBC data with Flink Streaming API

2021-08-10 Thread James Sandys-Lumsdaine
newly deployed Kafka streams. I've spent a lot of time reading the Flink book and web pages but I have some simple questions and assumptions I hope you can help with so I can progress. Firstly, I am wanting to use the DataStream API so we can both consume historic data and also realti

Re: Questions about keyed streams

2021-07-29 Thread Arvid Heise
Afaik you can express the partition key in Table API now which will be used for co-location and optimization. So I'd probably give that a try first and convert the Table to DataStream where needed. On Sat, Jul 24, 2021 at 9:22 PM Dan Hill wrote: > Thanks Fabian and Senhong! > > Here's an example

Re: Questions about keyed streams

2021-07-24 Thread Dan Hill
Thanks Fabian and Senhong! Here's an example diagram of the join that I want to do. There are more layers of joins. https://docs.google.com/presentation/d/17vYTBUIgrdxuYyEYXrSHypFhwwS7NdbyhVgioYMxPWc/edit#slide=id.p 1) Thanks! I'll look into these. 2) I'm using the same key across multiple Kaf

Re: Questions about keyed streams

2021-07-23 Thread Senhong Liu
Hi Dan, 1) If the key doesn’t change in the downstream operators and you want to avoid shuffling, maybe the DataStreamUtils#reinterpretAsKeyedStream would be helpful. 2) I am not sure that if you are saying that the data are already partitioned in the Kafka and you want to avoid shuffling in th

Re: Questions about keyed streams

2021-07-22 Thread Fabian Paul
Hi Dan, 1) In general, there is no guarantee that your downstream operator is on the same TM although working on the same key group. Nevertheless, you can try force this kind of behaviour to prevent the network transfer by either chaining the two operators (if no shuffle is in between) or confi

Questions about keyed streams

2021-07-21 Thread Dan Hill
Hi. 1) If I use the same key in downstream operators (my key is a user id), will the rows stay on the same TaskManager machine? I join in more info based on the user id as the key. I'd like for these to stay on the same machine rather than shuffle a bunch of user-specific info to multiple task m

Re: Questions about UPDATE_BEFORE row kind in ElasticSearch sink

2021-06-29 Thread Kai Fu
Thank you for the reply, Jark. In our case, we found that there are no UPDATE_BEFORE records generated since the join is using -D/+I row kinds. *> Note: "+I" represents "INSERT", "-D" represents "DELETE", "+U" represents "UPDATE_AFTER",* * "-U" represents "UPDATE_BEFORE". We forward input RowK

Re: Questions about UPDATE_BEFORE row kind in ElasticSearch sink

2021-06-27 Thread Jark Wu
UPDATE_BEFORE is required in cases such as Aggregation with Filter. For example: SELECT * FROM ( SELECT word, count(*) as cnt FROM T GROUP BY word ) WHERE cnt < 3; There is more discussion in this issue: https://issues.apache.org/jira/browse/FLINK-9528 Best, Jark On Mon, 28 Jun 2021 at 13

Questions about UPDATE_BEFORE row kind in ElasticSearch sink

2021-06-27 Thread Kai Fu
Hi team, We notice the UPDATE_BEFORE row kind is not dropped and treated as DELETE as in code

Re: Questions about implementing a flink source

2021-06-09 Thread Arvid Heise
not complete and the task gets restarted (could be in an inconsistent state). On Tue, Jun 8, 2021 at 10:51 PM Evan Palmer wrote: > Hello again, > > Thank you for all of your help so far, I have a few more questions if you > have the time :) > > 1. Deserialization Schema > > Th

Re: Questions about implementing a flink source

2021-06-08 Thread Evan Palmer
Hello again, Thank you for all of your help so far, I have a few more questions if you have the time :) 1. Deserialization Schema There's been some debate within my team about whether we should offer a DeserializationSchema and SerializationSchema in our source and sink. If we include

Re: Re: Questions Flink DataStream in BATCH execution mode scalability advice

2021-05-19 Thread Yun Gao
bos Send Date:Thu May 20 01:16:39 2021 Recipients:Yun Gao CC:user Subject:Re: Questions Flink DataStream in BATCH execution mode scalability advice > On May 19, 2021, at 7:26 AM, Yun Gao wrote: > > Hi Marco, > > For the remaining issues, > > 1. For the aggregation, th

Re: two questions about flink stream processing: kafka sources and TimerService

2021-05-19 Thread Jin Yi
across partitions to remain in order you may > need to use parallelism 1. I'll attach some links here which might be > useful: > > > https://stackoverflow.com/questions/50340107/order-of-events-with-chained-keyby-calls-on-same-key > > https://stackoverflow.com/questions/441

Re: Questions Flink DataStream in BATCH execution mode scalability advice

2021-05-19 Thread Marco Villalobos
se > the timeout. > > Best, > Yun > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/config/#heartbeat-timeout > > > --Original Mail -- > Sender:Marco Villalobos > Send Date:Wed May 19 14:03:48 2021 >

Re: Questions Flink DataStream in BATCH execution mode scalability advice

2021-05-19 Thread Yun Gao
14:03:48 2021 Recipients:user Subject:Questions Flink DataStream in BATCH execution mode scalability advice Questions Flink DataStream in BATCH execution mode scalability advice. Here is the problem that I am trying to solve. Input is an S3 bucket directory with about 500 GB of data across many

Questions Flink DataStream in BATCH execution mode scalability advice

2021-05-18 Thread Marco Villalobos
Questions Flink DataStream in BATCH execution mode scalability advice. Here is the problem that I am trying to solve. Input is an S3 bucket directory with about 500 GB of data across many files. The instance that I am running on only has 50GB of EBS storage. The nature of this data is time

Re: two questions about flink stream processing: kafka sources and TimerService

2021-05-18 Thread Ingo Bürk
Hi Jin, 1) As far as I know the order is only guaranteed for events from the same partition. If you want events across partitions to remain in order you may need to use parallelism 1. I'll attach some links here which might be useful: https://stackoverflow.com/questions/50340107/order-of-e

Re: two questions about flink stream processing: kafka sources and TimerService

2021-05-17 Thread Jin Yi
ping? On Tue, May 11, 2021 at 11:31 PM Jin Yi wrote: > hello. thanks ahead of time for anyone who answers. > > 1. verifying my understanding: for a kafka source that's partitioned on > the same piece of data that is later used in a keyBy, if we are relying on > the kafka timestamp as the event

  1   2   3   4   5   >