Re: Source vs SourceFunction and testing

2022-05-24 Thread Qingsheng Ren
Hi Piotr, I’d like to share my understanding about this. Source and SourceFunction are both interfaces to data sources. SourceFunction was designed and introduced earlier and as the project evolved, many shortcomings emerged. Therefore, the community re-designed the source interface and introdu

Re: accuracy validation of streaming pipeline

2022-05-24 Thread Leonard Xu
Hi, vtygoss > I'm working on migrating from full-data-pipeline(with spark) to > incremental-data-pipeline(with flink cdc), and i met a problem about accuracy > validation between pipeline based flink and spark. Glad to hear that ! > For bounded data, it's simple to validate the two result se

Re: accuracy validation of streaming pipeline

2022-05-24 Thread Shengkai Fang
Hi, all. >From my understanding, the accuracy for the sync pipeline requires to snapshot the source and sink at some points. It is just like we have a checkpoint that contains all the data at some time for both sink and source. Then we can compare the content in the checkpoint and find the differ

Re: Application mode deployment through API call

2022-05-24 Thread Shengkai Fang
Hi, Peter. I am not sure whether this doc is enough or not. The doc[1] lists all the available REST API in the Flink runtime now. You can use the RestClient[2] to send request to the JM for later usage. Best, Shengkai [1] https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/ [2

Re: Flink Job Manager unable to recognize Task Manager Available slots

2022-05-24 Thread Teoh, Hong
Hi Sunitha, Without more information about your setup, I would assume you are trying to return JobManager (and HA setup) into a stable state. A couple of questions: * Since your job is cancelled, I would assume that the current job’s HA state is not important, so we can delete the checkpoin

Re: Source vs SourceFunction and testing

2022-05-24 Thread Ken Krugler
Hi Piotr, Yes, that should work (using DataStream as the common result of both source creation options) — Ken > On May 24, 2022, at 12:19 PM, Piotr Domagalski wrote: > > Hi Ken, > > Thanks Ken. I guess the problem I had was, as a complete newbie to Flink, > navigating the type system and be

Re: Source vs SourceFunction and testing

2022-05-24 Thread Piotr Domagalski
Hi Ken, Thanks Ken. I guess the problem I had was, as a complete newbie to Flink, navigating the type system and being still confused about differences between Source, SourceFunction, DataStream, DataStreamOperator, etc. I think the DataStream<> type is what I'm looking for? That is, then I can u

Flink DataStream and remote Stateful Functions interoperability

2022-05-24 Thread Himanshu Sareen
Team, I'm working on a POC where our existing Stateful Functions ( remote ) can interact with Datastream API. https://nightlies.apache.org/flink/flink-statefun-docs-release-3.2/docs/sdk/flink-datastream/ I started Flink cluster - ./bin/start-cluster.sh Then I submitted the .jar to Flink. Howeve

Re: Source vs SourceFunction and testing

2022-05-24 Thread Ken Krugler
Hi Piotr, The way I handle this is via a workflow class that uses a builder approach to specifying inputs, outputs, and any other configuration settings. The inputs are typically DataStream. This way I can separate out the Kafka inputs, and use testing sources that give me very precise control

Re: Flink Job Manager unable to recognize Task Manager Available slots

2022-05-24 Thread s_penakalap...@yahoo.com
Hi Team, Any inputs please badly stuck. Regards,Sunitha On Sunday, May 22, 2022, 12:34:22 AM GMT+5:30, s_penakalap...@yahoo.com wrote: Hi All, Help please! We have standalone Flink service installed in individual VM and clubed to form a cluster with HA and checkpoint in place. When can

Re: TolerableCheckpointFailureNumber not always applying

2022-05-24 Thread Gaël Renoux
I get the idea, but in our case this was a transient error: it was a network issue, which was solved later without any change in Flink (see last line of stack-trace). Errors in the sync phase are not always non-transient (in our case, they are pretty much never). To be honest, I have trouble imagi

Re: Source vs SourceFunction and testing

2022-05-24 Thread Aeden Jameson
Depending on the kind of testing you're hoping to do you may want to look into https://github.com/mguenther/kafka-junit. For example, you're looking for some job level smoke tests that just answer the question "Is everything wired up correctly?" Personally, I like how this approach doesn't require

Source vs SourceFunction and testing

2022-05-24 Thread Piotr Domagalski
Hi, I'm wondering: what ithe recommended way to structure the job which one would like to test later on with `MiniCluster`. I've looked at the flink-training repository examples [1] and they tend to expose the main job as a class that accepts a `SourceFunction` and a `SinkFunction`, which make se

Re:accuracy validation of streaming pipeline

2022-05-24 Thread Xuyang
I think for an unbounded data, we can only check the result at one point of time, that is the work what Watermark[1] does. What about tag one time and to validate the data accuracy at that moment? [1] https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/table/sql/create/#watermar

Re: Python Job Type Support in Flink Kubernetes Operator

2022-05-24 Thread Gyula Fóra
Hi Jeesmon! Sorry I completely missed this question earlier :) There is no support currently for Python jobs and I don't really have any experience with Python jobs so cannot really comment on how easy it would be to integrate it. We and most of the companies currently involved with developing t

Re: Python Job Type Support in Flink Kubernetes Operator

2022-05-24 Thread Jeesmon Jacob
Hi Gyula, Any idea on this? We are exploring current limitations of using the operator for Flink deployment and if there is a plan to support Python jobs in future will help us. Thanks, Jeesmon On Fri, May 20, 2022 at 3:46 PM Jeesmon Jacob wrote: > Hi there, > > Is there a plan to support Pyth

Re: Application mode deployment through API call

2022-05-24 Thread Peter Schrott
Hi Vikash, Could you be more precise about the shared libraries? Is there any documentation about this? Thanks, Peter On Tue, May 24, 2022 at 1:23 PM Vikash Dat wrote: > Similar to agent Biao, Application mode is okay if you only have a single > app, but when running multiple apps session mode

Re: Json Deserialize in DataStream API with array length not fixed

2022-05-24 Thread Qingsheng Ren
Hi Zain, I assume you are using DataStream API as described in the subject of your email, so I think you can define any functions/transformations to parse the json value, even the schema is changing. It looks like the value of field “array_coordinates” is a an escaped json-formatted STRING in

Re: Application mode deployment through API call

2022-05-24 Thread Vikash Dat
Similar to agent Biao, Application mode is okay if you only have a single app, but when running multiple apps session mode is better for control. In my experience, the CLIFrontend is not as robust as the REST API, or you will end up having to rebuild a very similar Rest API. For the meta space issu

Re: How can I set job parameter in flink sql

2022-05-24 Thread Qingsheng Ren
Hi, You can take use of the configuration “pipeline.global-job-parameters” [1] to pass your custom configs all the way into the UDF. For example you can execute this in SQL client: SET pipeline.global-job-parameters=black_list_path:/root/list.properties; Then you can get the value “/root/list.

GlobalCommitter in Flink's two-phase commit

2022-05-24 Thread di wu
Hello Regarding the GlobalCommitter in Flink's two-phase commit,  I see it was introduced in FLIP-143, but it seems to have been removed again in FLP-191 and marked as Deprecated in the source code. I haven't found any relevant information about the use of GlobalCommitter.  There are two questio