Re: accuracy validation of streaming pipeline

2022-05-23 Thread Shengkai Fang
It's a good question. Let me ping @Leonard to share more thoughts. Best, Shengkai vtygoss 于2022年5月20日周五 16:04写道: > Hi community! > > > I'm working on migrating from full-data-pipeline(with spark) to > incremental-data-pipeline(with flink cdc), and i met a problem about > accuracy validation bet

Re: Job Logs - Yarn Application Mode

2022-05-23 Thread Shengkai Fang
If you find the JM in the yarn web ui, I think you can also find the webui to access the Flink web ui with the JM. Best, Shengkai

Re: Window aggregation fails after upgrading to Flink 1.15

2022-05-23 Thread Shengkai Fang
Glad to see you find the root cause. I think we can shade the janino dependency if it influences the usage. WDYT, godfrey? Best, Shengkai Pouria Pirzadeh 于2022年5月21日周六 00:59写道: > Thanks for help; I digged into it and the issue turned out to be the > version of Janino: > flink-table has pinned

Re: Application mode -yarn dependancy error

2022-05-23 Thread Shengkai Fang
Hi. I think you should send the mail to the user mail list or stack overflow, which is about the usage and help. The dev mail list focus on the design of the Flink itself. Could you share more details for your problems, including - which version you use. - how you use the Flink, including you cod

Re: TolerableCheckpointFailureNumber not always applying

2022-05-23 Thread Hangxiang Yu
In my opinion, some exceptions in the async phase like timeout may happen related to the network, state size which will change, so maybe next time these failures will not occur. So the config makes sense for these. But this failure in the sync phase usually means the program will always fail and i

Re: OutputTag alternative with pyflink 1.15.0

2022-05-23 Thread Lakshya Garg
Thanks for the helpful links Yuxia. On Mon, May 23, 2022 at 2:31 PM yuxia wrote: > Yes, you're right. > > Hopefully, the master branch supported it [1]. But It haven't been > released. If you want to use output tag in python in 1.15, you can apply > this patch[1] to your Flink 1.15 and build it

Re: Json Deserialize in DataStream API with array length not fixed

2022-05-23 Thread Shengkai Fang
Hi. In the SQL, you can just specify the `array_coordinates` type ARRAY[1]. For example, ``` CREATE TABLE source( `array_coordinates` ARRAY> ) WITH ( 'format' = 'json' ) ``` [1] https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/formats/json/ Zain Haider Nemati

Re: Application mode deployment through API call

2022-05-23 Thread Shengkai Fang
Hi, all. > is there any plan in the Flink community to provide an easier way of deploying Flink with application mode on YARN Yes. Jark has already opened a ticket about how to use the sql client to submit the SQL in application mode[1]. What's more, in FLIP-222 we are able to manage the jobs in

Re: TolerableCheckpointFailureNumber not always applying

2022-05-23 Thread Gaël Renoux
Got it, thank you. I misread the documentation and thought the async referred to the task itself, not the process of taking a checkpoint. I guess there is currently no way to make a job never fail on a failed checkpoint? Gaël Renoux - Lead R&D Engineer E - gael.ren...@datadome.co W - www.datadome

Re: TolerableCheckpointFailureNumber not always applying

2022-05-23 Thread Hangxiang Yu
Hi, Gaël Renoux. As you could see in [1], There are some descriptions about the config: "This only applies to the following failure reasons: IOException on the Job Manager, failures in the async phase on the Task Managers and checkpoint expiration due to a timeout. Failures originating from the syn

Re: Flink Kubernetes Operator: Specifying env variables from ConfigMaps

2022-05-23 Thread Gyula Fóra
Hi Mads! I think you need to use the podTemplate for this. You can either do it in the top level spec or customize it for tm/jm respectively. Keep in mind that pod templates are merged with the base flink template so it's enough to specify the fields relevant for you (in these case the env variab

SV: SV: How to enable statefun metrics for the Slf4j reporter

2022-05-23 Thread Christopher Gustafson
Oh sorry no I was only looking at the JobManager logs, now I found them, thanks! Från: Chesnay Schepler Skickat: den 23 maj 2022 15:29:45 Till: Christopher Gustafson; user@flink.apache.org Ämne: Re: SV: How to enable statefun metrics for the Slf4j reporter Just t

Re: SV: How to enable statefun metrics for the Slf4j reporter

2022-05-23 Thread Chesnay Schepler
Just to double-check, you are checking the taskmanager logs, correct? On 23/05/2022 15:24, Christopher Gustafson wrote: Yes, Flink metrics are showing up as usual, but none of the ones that are listed in the StateFun documentation.

SV: How to enable statefun metrics for the Slf4j reporter

2022-05-23 Thread Christopher Gustafson
Yes, Flink metrics are showing up as usual, but none of the ones that are listed in the StateFun documentation. Från: Chesnay Schepler Skickat: den 23 maj 2022 14:29:15 Till: Christopher Gustafson; user@flink.apache.org Ämne: Re: How to enable statefun metrics fo

Flink Kubernetes Operator: Specifying env variables from ConfigMaps

2022-05-23 Thread Mads Ellersgaard Kalør
Hi, We use a number of environment variables to configure our Flink pipelines, such as Kafka connection info, hostnames for external services etc. This works well when running a standalone Kubernetes deployment or on a local environment in Docker, but I cannot find any documentation about how t

Re: flink sql api, exception when setting "table.exec.state.ttl"

2022-05-23 Thread Chesnay Schepler
You're probably mixing Flink versions. From the stack trace we can see that Flink classes are being loaded from 2 different jars (rocketmq-flink-1.0.0-SNAPSHOT.jar/flink-dist_2.12-1.13.5.jar); I'd suggest to resolve that first and see if the error persists. On 23/05/2022 14:32, 李诗君 wrote: f

flink sql api, exception when setting "table.exec.state.ttl"

2022-05-23 Thread 李诗君
flink version: 1.13.5 java code: StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); EnvironmentSettings settings = EnvironmentSettings.newInstance() .useBlinkPlanner() .inStreamingMode() .build(); StreamTable

Re: How to enable statefun metrics for the Slf4j reporter

2022-05-23 Thread Chesnay Schepler
You shouldn't have to do more than that. Flink metrics are showing up as expected? Including metrics from tasks? On 23/05/2022 14:03, Christopher Gustafson wrote: Hi! I am trying to enable the StateFun metrics in the documentation to be logged using the Slf4j reporter but I cannot figure ou

Re: Applying backpressure to limit state memory consumption

2022-05-23 Thread Robin Cassan
Thanks Yu'an for your answer! Our issue lies in the fact that the window size is variable with the incoming traffic and would like a solution to avoid filling our window state in case of spikes. Even with RocksDB we would eventually be limited by the disk size (which, admittedly, is usually bigger

How to enable statefun metrics for the Slf4j reporter

2022-05-23 Thread Christopher Gustafson
Hi! I am trying to enable the StateFun metrics in the documentation to be logged using the Slf4j reporter but I cannot figure out how to do it, and the documentation is pretty vague if you are not familiar with the Flink metrics beforehand. Could someone show me how to enable it, i.e what entr

Re: Flink UI in Application Mode

2022-05-23 Thread Zain Haider Nemati
Hi David, Thanks for your response. When submitting a job in application mode it gives a url at the end but that is different i.e. on different ports when you submit different jobs in application mode. Is there a port/ui where I can see the consolidated list of jobs running instead of checking each

Re: Request for Review: FLINK-27507 and FLINK-27509

2022-05-23 Thread David Anderson
I've taken care of this. David On Sun, May 22, 2022 at 4:12 AM Shubham Bansal wrote: > Hi Everyone, > > I am not sure who to reach out for the reviews of these changesets, so I > am putting this on the mailing list here. > > I have raised the review for > FLINK-27507 - https://github.com/apache

TolerableCheckpointFailureNumber not always applying

2022-05-23 Thread Gaël Renoux
Hello everyone, We're having an issue on our Flink job: it restarted because it failed a checkpoint, even though it shouldn't have. We've set the tolerableCheckpointFailureNumber to 1 million to never have the job restart because of this. However, the job did restart following a checkpoint failure

Re: OutputTag alternative with pyflink 1.15.0

2022-05-23 Thread yuxia
Yes, you're right. Hopefully, the master branch supported it [1]. But It haven't been released. If you want to use output tag in python in 1.15, you can apply this patch[1] to your Flink 1.15 and build it by yourself[3]. BTW, if you don't want to bother to build. You can use java/scala api.

Re: Flink UI in Application Mode

2022-05-23 Thread David Morávek
Hi Zain, you can find a link to web-ui either in the CLI output after the job submission or in the YARN ResourceManager web ui [1]. With YARN Flink needs to choose the application master port at random (could be somehow controlled by setting _yarn.application-master.port_) as there might be multip

Re: Kinesis Sink - Data being received with intermittent breaks

2022-05-23 Thread Danny Cranmer
Hi Zain, Glad you found the problem, good luck! Thanks, Danny Cranmer On Fri, May 20, 2022 at 10:05 PM Zain Haider Nemati wrote: > Hi Danny, > I looked into it in a bit more thorough detail, the bottleneck seems to be > the transform function which is at 100% and causing back pressuring. Im >