We're seeing an issue with our Flink TMs (1.4.2) where we explicitly set
the TM data and RPC ports. When the TM comes up, we see the following
bindings:
==
tcp6 0 0 :::9249 :::*LISTEN
2284/java
tcp6 0 0 1
I'm currently writing some code to convert the back-pressure REST API data
into Prometheus-compatible output. I was just curious why back-pressure
wasn't already exposed as a metric in the in-built Prometheus exporter? Is
it because the thread-sampling is too intensive? Or too slow (particularly
if
We've got a relatively simply job that reads in from Kafka, and writes to
S3. We've had a couple of job failures where the consumer lag had built up,
but after the restart, the lag was wiped out because our offset positions
were lost and we consumed from latest offset.
The job has checkpointing en
The HA documentation is a little confusing in that it suggests JM
registration and discovery is done via Zookeeper, but it also recommends
creating a static `masters` file listing all JMs.
The only use I can currently see for the masters file is by the
`start-cluster.sh` script.
Thus, if we're not
Great! Thanks Gary
On 20 April 2018 at 08:22, Gary Yao wrote:
> Hi David,
>
> You are right. If you don't use start-cluster.sh, the conf/masters file is
> not
> needed.
>
> Best,
> Gary
>
>
> On Wed, Apr 18, 2018 at 8:25 AM, David Corley
> wrote:
>
&
We're looking at DR scenarios for our Flink cluster. We already use
Zookeeper for JM HA. We use a HDFS cluster that's replicated off-site, and
our
high-availability.zookeeper.storageDir
property is configure to use HDFS.
However, in the event of a site-failure, is it also essential that we have
a
I have a job using the keyBy function. The job parallelism is 40. My key is
based on a field in the records that has 2000+ possible values
My question is for the records for a given key, will they all be sent to
the one subtask or be distributed evenly amongst the all 40 downstream
operator sub tas