Hi Flink users,
I am trying to figure out how leverage parallelism to improve throughput of
a Kafka consumer. From my research, I understand the scenario when *kafka
partitions (=<>) # consumer and * to use rebalance spread messages evenly
across workers.
Also use setParallelism(#) to achieve the
Was any progress ever made on this? We have seen the same issue
in the past. What I do remember is, whatever I set max.block.ms
to, is when the job crashes.
I am going to attempt to reproduce the issue again and will report
back.
On 3/28/19 3:
Hi everyone,
I've made a Beam pipeline that makes use of a SideInput which in my case is a
Map of key/values. I'm running Flink (1.7.1) on yarn (hadoop 2.6.0). I've found
that if my map is small enough everything works fine but if I make it large
enough (2-3MB) the pipeline fails with,
org.apa
Hi Padarn for what it's worth I am using DataDog metrics on EMR with Flink
1.7.1 and this here my flink-conf configuration:
- Classification: flink-conf
ConfigurationProperties:
metrics.reporter.dghttp.class:
org.apache.flink.metrics.datadog.DatadogHttpReporter
metrics.reporter.dghttp.ap
Hi Padarn
If you want to verify why no metrics sending out, how about using the built-in
Slf4j reporter [1] which would record metrics in logs.
If you could view the metrics after enabled slf4j-reporter, you could then
compare the configurations.
Best
Yun Tang
[1]
https://ci.apache.org/projec
Hello all,
I am trying to run Flink 1.7.1 on EMR and having some trouble with metric
reporting.
I was using the DataDogHttpReporter, but have also tried the
StatsDReporter, but with both was seeing no metrics being collected.
To debug this I implemented my own reporter (based on StatsDReporter)
Hi Experts,
In batch computing, there are products like Azkaban or airflow to
manage batch job dependencies. By using the dependency management tool, we can
build a large-scale system consist of small jobs.
In stream processing, it is not practical to put all dependencies in
one