Hi,
I am trying to submit a pyFlink job in detached mode using the command:
../../flink-1.11.0/bin/flink run -d -py basic_streaming_job.py -j
flink-sql-connector-kafka_2.11-1.11.0.jar
The jobs are submitted successfully but the command does not return. I
realized that was because I had the follow
Teodor,
I've concluded this is a bug, and have reported it:
https://issues.apache.org/jira/browse/FLINK-19109
Best regards,
David
On Sun, Aug 30, 2020 at 3:01 PM Teodor Spæren
wrote:
> Hey again David!
>
> I tried your proposed change of setting the paralilism higher. This
> worked, but why do
Hi Alexey,
Glad to hear that your are interested the K8s HA support.
Roman's answer is just on point.
"FileSystemBlobStore" is trying to store the user jars, job graph, etc. on
the distributed storage(e.g. HDFS, S3, GFS). So when the
JobManager failover, it could fetch the blob data from remote
Hi Ivan,
For flink sink it might commit a single file multiple times. This happens if
there are failover after committing one file and then met with a failover, the
the job will restarted from the latest checkpoint, and the file's state will be
get back to pending and committed again. In the
Yes, the recommended way to proceed in your use case is to put all classes
in a single JAR and to specify the main class to run to the flink client.
Best,
Flavio
I can only agree with Dawid, who explained it better than me... 😅
Aljoscha
On 31.08.20 12:10, Dawid Wysakowicz wrote:
Hey Arvid,
The problem is that the StreamStatus.IDLE is set on the Task level. It
is not propagated to the operator. Combining of the Watermark for a
TwoInputStreamOperator hap
Hi Izual,
thanks for contributing and improving the documentation. The PR will be
picked up as part of our regular maintenance work. The communication will
happen through PR conversations as soon as someone picks it up.
Best,
Matthias
On Tue, Sep 1, 2020 at 8:44 AM izual wrote:
> I tried to fix
Hi,
I have a S3 bucket that is continuously written to by millions of devices.
These upload small compressed archives.
What I want to do is treat the tar gzipped (.tgz) files as a streaming source
and process each archive. The archive contains three files that each might need
to be processed.
Why don’t you get an S3 notification on SQS and do the actions from there?
You will probably need to write the content of the files to a no sql database .
Alternatively send the s3 notification to Kafka and read flink from there.
https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo
Word of caution. Streaming from S3 is really cost prohibitive as the only
way to detect new files is to continuously spam the S3 List API.
On Tue, Sep 1, 2020 at 4:50 PM Jörn Franke wrote:
> Why don’t you get an S3 notification on SQS and do the actions from there?
>
> You will probably need to
Thanks for your replies Matthias and Timo.
Converting the Table to DataStream, assigning a new Watermark & Rowtime
attribute, and converting back to Table makes sense. One challenge with
that approach is that Table to DataStream conversion could emit retractable
data stream, however, I think, that
Judging from FLIP-19 (thank you Roman for the link), of 3 use cases (jars, RPC
messages and log files) only jar files need HA guarantee, and in my particular
case, job cluster with jar as part of image, it seems doesn't matter, I guess
it explains why in my test I was able to recover from failur
Hi Team,
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/#task-chaining-and-resource-groups
*Flink chaining my Tasks which is like: stream.map().filter().map() *
*I think here the entire chain runs in the same slot.*
*Documentation says flink does chahining for bet
Thanks all, I could see the metrics.
On Thu, Aug 27, 2020 at 7:51 AM Robert Metzger wrote:
> I don't think these error messages give us a hint why you can't see the
> metrics (because they are about registering metrics, not reporting them)
>
> Are you sure you are using the right configuration p
Hi Manas,
When running locally, you need
`ten_sec_summaries.get_job_client().get_job_execution_result().result()` to
wait job finished. However, when you submit to the cluster, you need to
delete this code. In my opinion, the current feasible solution is that you
prepare two sets of codes for this
Hi Xingbo,
Thank you for clarifying that. I am indeed maintaining a different version
of the code by commenting those lines, but I was just wondering if it was
possible to detect the environment programmatically.
Regards,
Manas
On Wed, Sep 2, 2020 at 7:32 AM Xingbo Huang wrote:
> Hi Manas,
>
>
Hi,
Nice to see the progress of Stateful functions.
I have a few questions that I hope you can reply to.
My first question is regarding the newly implemented
StatefulFunctionDataStreamBuilder.
Is there anything to pay attention to if one first union a couple of streams
and performs a sort via a
This worked, thanks! Looking forward to the future releases :)
On Mon, Aug 31, 2020 at 5:06 PM Marta Paes Moreira
wrote:
> Hey, Rex!
>
> This is likely due to the tombstone records that Debezium produces for
> DELETE operations (i.e. a record with the same key as the deleted row and a
> value of
18 matches
Mail list logo