Hi Piotr,
I’d like to share my understanding about this. Source and SourceFunction are
both interfaces to data sources. SourceFunction was designed and introduced
earlier and as the project evolved, many shortcomings emerged. Therefore, the
community re-designed the source interface and introdu
Hi, vtygoss
> I'm working on migrating from full-data-pipeline(with spark) to
> incremental-data-pipeline(with flink cdc), and i met a problem about accuracy
> validation between pipeline based flink and spark.
Glad to hear that !
> For bounded data, it's simple to validate the two result se
Hi, all.
>From my understanding, the accuracy for the sync pipeline requires to
snapshot the source and sink at some points. It is just like we have a
checkpoint that contains all the data at some time for both sink and
source. Then we can compare the content in the checkpoint and find the
differ
Hi, Peter.
I am not sure whether this doc is enough or not. The doc[1] lists all the
available REST API in the Flink runtime now. You can use the RestClient[2]
to send request to the JM for later usage.
Best,
Shengkai
[1] https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/
[2
Hi Sunitha,
Without more information about your setup, I would assume you are trying to
return JobManager (and HA setup) into a stable state. A couple of questions:
* Since your job is cancelled, I would assume that the current job’s HA
state is not important, so we can delete the checkpoin
Hi Piotr,
Yes, that should work (using DataStream as the common result of both
source creation options)
— Ken
> On May 24, 2022, at 12:19 PM, Piotr Domagalski wrote:
>
> Hi Ken,
>
> Thanks Ken. I guess the problem I had was, as a complete newbie to Flink,
> navigating the type system and be
Hi Ken,
Thanks Ken. I guess the problem I had was, as a complete newbie to Flink,
navigating the type system and being still confused about differences
between Source, SourceFunction, DataStream, DataStreamOperator, etc.
I think the DataStream<> type is what I'm looking for? That is, then I can
u
Team,
I'm working on a POC where our existing Stateful Functions ( remote ) can
interact with Datastream API.
https://nightlies.apache.org/flink/flink-statefun-docs-release-3.2/docs/sdk/flink-datastream/
I started Flink cluster - ./bin/start-cluster.sh
Then I submitted the .jar to Flink.
Howeve
Hi Piotr,
The way I handle this is via a workflow class that uses a builder approach to
specifying inputs, outputs, and any other configuration settings.
The inputs are typically DataStream.
This way I can separate out the Kafka inputs, and use testing sources that give
me very precise control
Hi Team,
Any inputs please badly stuck.
Regards,Sunitha
On Sunday, May 22, 2022, 12:34:22 AM GMT+5:30, s_penakalap...@yahoo.com
wrote:
Hi All,
Help please!
We have standalone Flink service installed in individual VM and clubed to form
a cluster with HA and checkpoint in place. When can
I get the idea, but in our case this was a transient error: it was a
network issue, which was solved later without any change in Flink (see last
line of stack-trace). Errors in the sync phase are not always non-transient
(in our case, they are pretty much never).
To be honest, I have trouble imagi
Depending on the kind of testing you're hoping to do you may want to
look into https://github.com/mguenther/kafka-junit. For example,
you're looking for some job level smoke tests that just answer the
question "Is everything wired up correctly?" Personally, I like how
this approach doesn't require
Hi,
I'm wondering: what ithe recommended way to structure the job which one
would like to test later on with `MiniCluster`.
I've looked at the flink-training repository examples [1] and they tend to
expose the main job as a class that accepts a `SourceFunction` and a
`SinkFunction`, which make se
I think for an unbounded data, we can only check the result at one point of
time, that is the work what Watermark[1] does. What about tag one time and to
validate the data accuracy at that moment?
[1]
https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/table/sql/create/#watermar
Hi Jeesmon!
Sorry I completely missed this question earlier :)
There is no support currently for Python jobs and I don't really have any
experience with Python jobs so cannot really comment on how easy it would
be to integrate it.
We and most of the companies currently involved with developing t
Hi Gyula,
Any idea on this? We are exploring current limitations of using the
operator for Flink deployment and if there is a plan to support Python jobs
in future will help us.
Thanks,
Jeesmon
On Fri, May 20, 2022 at 3:46 PM Jeesmon Jacob wrote:
> Hi there,
>
> Is there a plan to support Pyth
Hi Vikash,
Could you be more precise about the shared libraries? Is there any
documentation about this?
Thanks, Peter
On Tue, May 24, 2022 at 1:23 PM Vikash Dat wrote:
> Similar to agent Biao, Application mode is okay if you only have a single
> app, but when running multiple apps session mode
Hi Zain,
I assume you are using DataStream API as described in the subject of your
email, so I think you can define any functions/transformations to parse the
json value, even the schema is changing.
It looks like the value of field “array_coordinates” is a an escaped
json-formatted STRING in
Similar to agent Biao, Application mode is okay if you only have a single
app, but when running multiple apps session mode is better for control. In
my experience, the CLIFrontend is not as robust as the REST API, or you
will end up having to rebuild a very similar Rest API. For the meta space
issu
Hi,
You can take use of the configuration “pipeline.global-job-parameters” [1] to
pass your custom configs all the way into the UDF. For example you can execute
this in SQL client:
SET pipeline.global-job-parameters=black_list_path:/root/list.properties;
Then you can get the value “/root/list.
Hello
Regarding the GlobalCommitter in Flink's two-phase commit,
I see it was introduced in FLIP-143, but it seems to have been removed again in
FLP-191 and marked as Deprecated in the source code.
I haven't found any relevant information about the use of GlobalCommitter.
There are two questio
21 matches
Mail list logo