Hi Rex,
yes you can go directly into Flink since 1.11.0 [1], but afaik only through
Table API/SQL currently (which you seem to be using anyways most of the
time). I'd recommend using 1.11.1+ (some bugfixes) or even 1.12.0+ (many
new useful features [2]). You can also check the main doc [3].
If yo
Hmm, this is strange. From the logs it looks as if certain communications
between components don't arrive at the receiver's end. I think we have to
further dig into the problem.
In order to further narrow it down, could you try to start the cluster with
using pod IPs instead of K8s services for in
Hi. Are there good ways to debug long Flink checkpoint durations?
I'm running a backfill job that runs ~10 days of data and then starts
checkpointing failing. Since I only see the last 10 checkpoints in the
jobmaster UI, I don't see when it starts.
I looked through the text logs and didn't see
Hi,
I am looking for a way to allocate each created subTask to a specific
TaskManager.
Is there any way to force assigning tasks to specific taskManagers?
Thank you
Best regards,
Hyejo Hwang
Thanks Arvid,
I think my confusion lies in misinterpreting the meaning of CDC. We
basically don't want CDC, we just use it to get data into a compacted Kafka
topic where we hold the current state of the world to consume from multiple
consumers. You have described pretty thoroughly where we want to
Hi Le,
I believe the issue is the bounded source[1]. Stateful Functions only
supports unbounded inputs.
Additionally, you can remove all the `synchronized` blocks from your code;
statefun handles all synchronization for you.
Seth
[1]
https://gist.github.com/flint-stone/cbc60f2d41507fdf33507ba99
Hey David et all,
I had one follow up question for this as I've been putting together some
integration/unit tests to verify that things are working as expected with
finite datasets (e.g. a text file with several hundred records that are
serialized, injected into Kafka, and processed through the pi
Hello,
I'm trying to run an experiment w/ two flink jobs:
- A producer producing messages to hundreds of topics
- A consumer consuming the messages from all the topics
After the job runs after a few minutes, it will fail w/ following error:
Caused by: org.apache.kafka.common.errors.Timeou
Hi at all,
i would like to know how far a state schema evolution is possible by using SQL
API of Flink. Which query changes can I do without disrupting the schema of my
savepoint?
In the documentation is, only for the DataStream API , written what are the
do's and don'ts regarding a safe sch
Hi all -
I’m trying to keep some state around for a little while after a window fires to
use as queryable state. I am intending on using something like:
.keyBy()
.timeWindow(Time.minutes(1)).allowedLateness(Time.minutes(90))
.aggregate(…)
.keyBy()
.asQueryableState(...)
My intent is to keep tha
Hi, I was looking at Apache Beam/Flink for some of our data processing
needs, but when reading about the resource managers
(YARN/mesos/Kubernetes), it seems like they all top out at around 10k
nodes. What are recommended solutions for scaling higher than this?
Thanks in advance,
Joey
Another question is: The timeout of 48 hours sounds strange. There should
have been some other system noticing the connection problem earlier
assuming that you have a reasonably low heartbeat interval configured.
Matthias
On Mon, Mar 1, 2021 at 1:22 PM Matthias Pohl wrote:
> Thanks for providin
Thanks for providing this information, Rainie. Are other issues documented
in the logs besides the TimeoutException in the JM logs which you already
shared? For now, it looks like that there was a connection problem between
the TaskManager and the JobManager that caused the shutdown of the operator
Hi Tamir,
> The histogram has been taken from Task Manager using jcmd tool.
>From that histogram, I guest there is no classloader leaking.
> A simple batch job with single operation . The memory bumps to ~600MB
(after single execution). once the job is finished the memory never freed.
It could
I have a job which includes about 50+ tasks. I want to split it to multiple
jobs, and the data is transferred through Kafka, but how about watermark?
Is anyone have do something similar and solved this problem?
Here I give an example:
The original job: kafkaStream1(src-topic) => xxxProcess => xxx
Hi Omer,
thanks for the logs. Could you tell us a bit more about the concrete setup
of your Flink K8s cluster? It looks to me as if the ResourceManager cannot
talk to the JobMaster which tries to register at the RM. Also some
JobMasters don't seem to reach the ResourceManager. Could it be that you
Hi Shengkai,
thanks for this hint. I solves the issue having more consumer tasks than kafka
partitions.
But the case with dropping events while having less consumer tasks than kafka
partitions is still there. I think it will be okay in version 1.12 [1]
[1] https://issues.apache.org/jira/browse
> We are rereading the topics, at any time we might want a completely
different materialized view for a different web service for some new
application feature. Other jobs / new jobs need to read all the up-to-date
rows from the databases.
> I still don't see how this is the case if everything just
That solved it, thank you very much Kezhu :)
On 28.02.21 16:12, Kezhu Wang wrote:
Hi Morgan,
You could check FLINK-11654, from its description, I think it is the
problem you encountered.
> We run multiple jobs on a cluster which write a lot to the same
Kafka topic from identically named sin
Hi Yang,
Unfortunately I didn't save log. Trying to reproduce again, but now hitting
different error - about incompatible version of ImmutableMapSerializer, which
is strange, since while serialVersionUID indeed changed, however this
serializer is only registered but not used, (there is no state
20 matches
Mail list logo