Hi Joey,
We are currently running around 2000+ small Flink clusters on top of k8s,
currently at around ~ 100 nodes. Do you see yourself scaling to 10k nodes,
given that each node can run a significant amount of Flink jobs inside of
it?
On Thu, Mar 4, 2021 at 10:51 AM Piotr Nowojski wrote:
> May
Maybe a stupid question Joey, but if the problem is in the resource
managers, haven't you tried running standalone Flink clusters without any
resource manager? Probably you would still hit the JobManager problems that
Xintong mentioned, but those problems we can help addressing.
Piotrek
czw., 4 m
Hi Joey,
Quick question: by *nodes*, do you mean Flink task manager processes, or
physical/virtual machines (like ecs, yarn NM)?
In our production, we run flink workloads on several Yarn/Kubernetes
clusters, where each cluster typically has 2k~5k machines. Most Flink
workloads are deployed in sin
Hi Joey,
Sorry for not responding to your question sooner. As you can imagine there
are not many users running Flink at such scale. As far as I know, Alibaba
is running the largest/one of the largest clusters, I'm asking for someone
who is familiar with those deployments to take a look at this con
Hi, I was looking at Apache Beam/Flink for some of our data processing
needs, but when reading about the resource managers
(YARN/mesos/Kubernetes), it seems like they all top out at around 10k
nodes. What are recommended solutions for scaling higher than this?
Thanks in advance,
Joey