Hi KristoffSC,
I'd strongly suggest not blocking the task thread if it involves external
services. RPC notification cannot be processed and checkpoints are delayed
when the task thread is blocked. That's what AsyncIO is for.
If your third party library just takes a few ms to finish computation
wi
Each task will be assigned a dedicated thread for its data processing.
A slot can be shared by multiple tasks if they are in the same slot sharing
group[1].
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/concepts/runtime.html#task-slots-and-resources
Thanks,
Zhu
ゞ野蠻遊戲χ 于2020年1
Hi all
Can only one thread run at a time for a slot? Or one
slot can run multiple threads in parallel at the same time?
Thanks,
Jiazhi
Hi Kevin Kwon ~
Do you want to customize only the source operator name or all the operator
name in order for the state compatibility ?
State compatibility is an orthogonal topic and keep the operator name is
one way to solve it.
Kevin Kwon 于2020年11月25日周三 上午1:11写道:
> For SQLs, I know that the o
Thank you. This is very helpful.
On Mon, Nov 23, 2020 at 9:46 AM Till Rohrmann wrote:
> Hi George,
>
> Here is some documentation about how to deploy a stateful function job
> [1]. In a nutshell, you need to deploy a Flink cluster on which you can run
> the stateful function job. This can either
Hi Arvid,
Thank you for your answer.
And what if a) would block task's thread?
Let's say I'm ok with making entire task thread to wait on this third party
lib.
In that case I would be safe from having this exception even though I would
not use AsyncIO?
--
Sent from: http://apache-flink-user
Hi everyone,
I have a question about how delayed messages work, I tried to dig through
some docs on it, but not sure it addresses exactly my question. Basically,
if I send a delayed message with exactly-once mode on, does Flink need to
wait until the delayed message sends to commit Kafka offsets?
Probolved solved. It is because another function sends messages to myFunc
by using non hard coded ids. Thanks.
On Tue, Nov 24, 2020 at 11:24 AM Lian Jiang wrote:
> Hi,
>
> I am using statefun 2.2.0 and have below routing:
>
> downstream.forward(myFunc.TYPE, myFunc.TYPE.name(), message);
>
> I
Hi KristoffSC,
sorry for the confusing error message. In short, mailbox thread = task
thread.
your operator a) calls collector.collect from a different thread (in which
the CompleteableFuture is completed). However, all APIs must always be used
from the task thread.
The only way to cross thread
Hi,
I am using statefun 2.2.0 and have below routing:
downstream.forward(myFunc.TYPE, myFunc.TYPE.name(), message);
I expect this statement will create only one physical myFunc because
the id is hard coded with myFunc.TYPE.name().
This design can use the PersistedValue field in myFunc for all
i
Hi Timo and Dawid,
Thank you for a detailed answer; it looks like we need to reconsider all job
submission flow.
What is the best way to compare the new job graph? Can we use Flink
visualizer to ensure that the new job graph shares the table as you mention
It is not guaranteed?
Best regards,
For SQLs, I know that the operator ID assignment is not possible now since
the query optimizer may not be backward compatible in each release
But are DDLs also affected by this?
for example,
CREATE TABLE mytable (
id BIGINT,
data STRING
) with (
connector = 'kafka'
...
id = 'mytable'
Hi Fuyao,
great that you could make progress.
2. Btw nice summary of the idleness concept. We should have that in the
docs actually.
4. By looking at tests like `IntervalJoinITCase` [1] it seems that we
also support FULL OUTER JOINs as interval joins. Maybe you can make use
of them.
5. "b
Hi,
I faced an issue on Flink 1.11. It was for now one time thing and I cannot
reproduce it. However I think something is lurking there...
I cannot post full stack trace and user code however I will try to describe
the problem.
Setup without any resource groups with only one Operator chain restri
Hi Patrick,
Flink supports regional failover [1] which only restarts all tasks
connected via pipelined data exchanges. Hence, either when having an
embarrassingly parallel topology or running a batch job, Flink should not
restart the whole job in case of a task failure.
However, in the case of si
Hi,
one advice I can give you is to checkout the code and execute some of
the examples in debugging mode. Esp. within Flink's functions e.g.
MapFunction or ProcessFunction you can set a breakpoint and look at the
stack trace. This gives you a good overview about the Flink stack in
general.
I agree with Dawid.
Maybe one thing to add is that reusing parts of the pipeline is possible
via StatementSets in TableEnvironment. They allow you to add multiple
queries that consume from a common part of the pipeline (for example a
common source). But all of that is compiled into one big job
Hi Flavio,
looking only at the code, then the job should first transition into a
globally terminal state before notifying the client about it. The only
possible reason I could see for this behaviour is that the
RestServerEndpoint uses an ExecutionGraphCache (DefaultExecutionGraphCache
is the imple
Hi all,
We are trying to setup regions to enable Flink to only stop failing tasks based
on region instead of failing the entire stream.
We are using one main stream that is reading from a kafka topic and a bunch of
side outputs for processing each event from that topic differently.
For the proce
Just to close this thread I found the cause of the problem: looking into
the code of the mysql connector the value of
PropertyDefinitions.SYSP_disableAbandonedConnectionCleanup
is "com.mysql.cj.disableAbandonedConnectionCleanup" and not
"com.mysql.disableAbandonedConnectionCleanup" as stated in [1]
For debugging you can also implement a simple non-parallel source using
`org.apache.flink.streaming.api.functions.source.SourceFunction`. You
would need to implement the run() method with an endless loop after
emitting all your records.
Regards,
Timo
On 24.11.20 16:07, Klemens Muthmann wrote:
Hi,
Really sorry for a late reply.
To the best of my knowledge there is no such possibility to "attach" to
a source/reader of a different job. Every job would read the source
separately.
`The GenericInMemoryCatalog is an in-memory
implementation of a catalog. All objects will be available only f
Hi,
the best way learning Flink's source code would be to dive into it [1].
[1] https://github.com/apache/flink
Cheers,
Till
On Tue, Nov 24, 2020 at 3:43 AM 心剑 <2752980...@qq.com> wrote:
> Excuse me, I want to learn the flink source code. Do you have any good
> information and the latest books
ok, thanks you all for the help! s
From: David Anderson
Sent: 24 November 2020 15:16
To: Simone Cavallarin
Cc: user@flink.apache.org
Subject: Re: Print on screen DataStream content
Simone,
What you want to do is to override the toString() method on Event so th
Simone,
What you want to do is to override the toString() method on Event so that
it produces a more helpful String as its result, and then use
stream.print()
in your IDE (where stream is a DataStream).
By the way, printOrTest(stream) isn't part of Flink -- that's just
something used by the tra
Hi,
yes, I would like to debug locally on my IDE.
This is what I tried so far, but no luck.
a)String ff = result.toString();
System.out.print(ff);
b) printOrTest(stream);
c)stream.print();
d) System.out.println(stream.print());
This is the output and to me it looks l
Hi,
Thanks for your reply. I am using processing time instead of event time,
since we do get the events in batches and some might arrive days later.
But for my current dev setup I just use a CSV dump of finite size as
input. I will hand over the pipeline to some other guys, who will need
to
Hi Saksham,
could you tell us a bit more about your deployement where you run Flink.
This seems to be the root exception:
2020-11-24 11:11:16,296 ERROR
org.apache.flink.runtime.rest.handler.taskmanager.TaskManagerStdoutFileHandler
[] - Failed to transfer file from TaskExecutor
f0dc0ae680e65a
Hi Klemens,
what you are observing are reasons why event-time should be preferred
over processing-time. Event-time uses the timestamp of your data while
processing-time is to basic for many use cases. Esp. when you want to
reprocess historic data, you want to do that at full speed instead of
Hi All,
I am running a job in flink and somehow the job is failing and the task
manager is getting out of the pool unknowingly.
Also some heartbeat timeout exceptions are coming.
Thanks,
Saksham
2020-11-24 11:07:44,594 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint[] -
---
Hi Simone,
if you are just executing DataStream pipelines locally in your IDE while
prototyping. You should be able to use `DataStream#print()` which just
prints to standard out [1] (It might be hidden between the log messages).
For debugging locally, you can also just set breakpoints in your
ok that's fine to me, just add an @internal annotation on the
RestClusterClient if it is intended only for internal use.. but wouldn't be
easier to provide some sort of client generation facility (e.g. swagger or
similar)?
Il mar 24 nov 2020, 11:38 Till Rohrmann ha scritto:
> I see the point in
Hi Kevin,
I expect the 1.12.0 release to happen within the next 3 weeks.
Cheers,
Till
On Tue, Nov 24, 2020 at 4:23 AM Yang Wang wrote:
> Hi Kevin,
>
> Let me try to understand your problem. You have added the trusted keystore
> to the Flink app image(my-flink-app:0.0.1)
> and it could not be l
I see the point in having a richer RestClusterClient. However, I think we
first have to make a decision whether the RestClusterClient is something
internal or not. If it is something internal, then only extending the
RestClusterClient and not adding these convenience methods to ClusterClient
could
I tried to `DataStream#print()` but I don't quite understand how to implement
it. Could you please give me an example? I'm using Intellij so what I would
need is just to see the data on my screen.
Thanks
From: David Anderson
Sent: 24 November 2020 10:01
To: Pan
When Flink is running on a cluster, `DataStream#print()` prints to files in
the log directory.
Regards,
David
On Tue, Nov 24, 2020 at 6:03 AM Pankaj Chand
wrote:
> Please correct me if I am wrong. `DataStream#print()` only prints to the
> screen when running from the IDE, but does not work (pri
Hi,
I have written an Apache Flink Pipeline containing the following piece
of code (Java):
stream.window(ProcessingTimeSessionWindows.withGap(Time.seconds(50))).aggregate(new
CustomAggregator()).print();
If I run the pipeline using local execution I see the following
behavior. The "CustomA
Hello everybody,
these days I have been trying to use the JobListener to implement a simple
logic in our platform that consists in calling an external service to
signal that the job has ended and, in case of failure, save the error cause.
After some problems to make it work when starting a job usi
38 matches
Mail list logo