Thanks Yangze for providing these links I'll try it !
-邮件原件-
发件人: Yangze Guo [mailto:karma...@gmail.com]
发送时间: 2020年8月18日 星期二 12:57
收件人: 范超
抄送: user (user@flink.apache.org)
主题: Re: How to specify the number of TaskManagers in Yarn Cluster using Per-Job
Mode
The number of TM mainly dep
The number of TM mainly depends on the parallelism and job graph.
Flink now allows you to set the maximum slots number
(slotmanager-number-of-slots-max[1]). There is also a plan to support
setting the minimum number of slots[2].
[1]
https://ci.apache.org/projects/flink/flink-docs-master/ops/confi
Thanks Yangze
1. Do you meet any problem when deploying on Yarn or running Flink job?
My job works well
2. Why do you need to start the TMs on all the three machines?
From cluster perspective, I wonder if the process pressure can be balance to 3
machines.
3. Flink can control how many TM to sta
Hi,
Very thanks for bringing up this discussion!
One more question is that does the BATCH and STREAMING mode also decides
the shuffle types and operators? I'm asking so because that even for blocking
mode, it should also benefit from keeping some edges to be pipeline if the
resources
+1 for removing the methods that are deprecated for a while & have alternative
methods.
One specific thing is that if we remove the DataStream#split, do we consider
enabling side-output in more operators in the future ? Currently it should be
only available in ProcessFunctions, but not availabl
Hi all,
I am working on measuring the failure recovery time of Flink and I want to
decompose the recovery time into different parts, say the time to detect
the failure, the time to restart the job, and the time to
restore the checkpointing.
I found that I can measure the down time during failure
Thanks Yangze
The reason why I don’t deploying a standalone cluster, it's because there
kafka, kudu, hadoop, zookeeper on these machines, maybe currently using the
yarn to manage resources is the best choice for me.
If Flink can not control how many tm to start , could anyone providing me some
b
Hi,
Flink can control how many TM to start, but where to start the TMs
depends on Yarn.
Do you meet any problem when deploying on Yarn or running Flink job?
Why do you need to start the TMs on all the three machines?
Best,
Yangze Guo
On Tue, Aug 18, 2020 at 11:25 AM 范超 wrote:
>
> Thanks Yangze
Hi,
I think that is only related to the Yarn scheduling strategy. AFAIK,
Flink could not control it. You could check the RM log to figure out
why it did not schedule the containers to all the three machines. BTW,
if you have specific requirements to start with all the three
machines, how about dep
Thanks Yangze
All 3 machines NodeManager is started.
I just don't know why not three machines each running a Flink TaskManager and
how to achieve this
-邮件原件-
发件人: Yangze Guo [mailto:karma...@gmail.com]
发送时间: 2020年8月18日 星期二 10:10
收件人: 范超
抄送: user (user@flink.apache.org)
主题: Re: How to
Hi,
Do you start the NodeManager in all the three machines? If so, could
you check all the NMs correctly connect to the ResourceManager?
Best,
Yangze Guo
On Tue, Aug 18, 2020 at 10:01 AM 范超 wrote:
>
> Hi, Dev and Users
> I’ve 3 machines each one is 8 cores and 16GB memory.
> Following it’s my R
Hi, Do you think there can be any issue with Flinks performance, with 400Kb
up to 1 MB payload record sizes ? my Spark streaming seems to be doing
better. Are there any recommended configurations or increasing parallelism
to improve Flink streaming using flink kafka connect?
Regards,
Vijay
On F
Hi, Dev and Users
I’ve 3 machines each one is 8 cores and 16GB memory.
Following it’s my Resource Manager screenshot the cluster have 36GB total.
I specify the paralism to 3 or even up to 12, But the task manager is always
running on two nodes not all three machine, the third node does not start
Hi Chengcheng Zhang,
I think your request is related to this feature request from two years ago here
[1], with me asking about the status one year ago [2].
You might want to upvote this so we can hope that it gets some more attention
in future.
Today, it is possible to write your own DataStr
+1 for removing them.
>From a quick look, most of them (not all) have been deprecated a long time ago.
Cheers,
Kostas
On Mon, Aug 17, 2020 at 9:37 PM Dawid Wysakowicz wrote:
>
> @David Yes, my idea was to remove any use of fold method and all related
> classes including WindowedStream#fold
>
>
@David Yes, my idea was to remove any use of fold method and all related
classes including WindowedStream#fold
@Klou Good idea to also remove the deprecated enableCheckpointing() &
StreamExecutionEnvironment#readFile and alike. I did another pass over
some of the classes and thought we could also
Hi Kurt and David,
Thanks a lot for the insightful feedback!
@Kurt: For the topic of checkpointing with Batch Scheduling, I totally
agree with you that it requires a lot more work and careful thinking
on the semantics. This FLIP was written under the assumption that if
the user wants to have chec
Hi Marco,
You cannot really synchronize data that is being emitted via different
streams (without bringing them together in an operator).
I see two options:
1) emit the event to create the partition and the data to be written into
the partition to the same stream. Flink guarantees that records d
Interesting – what is the JobManager submission bounded by? Does it only allow
a certain number of submissions per second, or is there a number of threads it
accepts?
// ah
From: Robert Metzger
Sent: Tuesday, August 11, 2020 4:46 AM
To: Hailu, Andreas [Engineering]
Cc: user@flink.apache.org;
Kostas,
I'm pleased to see some concrete details in this FLIP.
I wonder if the current proposal goes far enough in the direction of
recognizing the need some users may have for "batch" and "bounded
streaming" to be treated differently. If I've understood it correctly, the
section on scheduling al
I assume that along with DataStream#fold you would also
remove WindowedStream#fold.
I'm in favor of going ahead with all of these.
David
On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz
wrote:
> Hi devs and users,
>
> I wanted to ask you what do you think about removing some of the
> deprecat
Thanks a lot for starting this Dawid,
Big +1 for the proposed clean-up, and I would also add the deprecated
methods of the StreamExecutionEnvironment like:
enableCheckpointing(long interval, CheckpointingMode mode, boolean force)
enableCheckpointing()
isForceCheckpointing()
readFile(FileInputFor
Hi devs and users,
I wanted to ask you what do you think about removing some of the
deprecated APIs around the DataStream API.
The APIs I have in mind are:
* RuntimeContext#getAllAccumulators (deprecated in 0.10)
* DataStream#fold and all related classes and methods such as
FoldFunction,
Hi Aaron,
I've recently been looking at this topic and working on a prototype. The
approach I am trying is "backward tracing", or data provenance tracing,
where we try to explain what inputs and steps have affected the production
of an output record.
Arvid has summarized the most important aspect
24 matches
Mail list logo