Hi Experts,
In batch computing, there are products like Azkaban or airflow to
manage batch job dependencies. By using the dependency management tool, we can
build a large-scale system consist of small jobs.
In stream processing, it is not practical to put all dependencies in
one
Hello all,
I am trying to run Flink 1.7.1 on EMR and having some trouble with metric
reporting.
I was using the DataDogHttpReporter, but have also tried the
StatsDReporter, but with both was seeing no metrics being collected.
To debug this I implemented my own reporter (based on StatsDReporter)
Hi Padarn
If you want to verify why no metrics sending out, how about using the built-in
Slf4j reporter [1] which would record metrics in logs.
If you could view the metrics after enabled slf4j-reporter, you could then
compare the configurations.
Best
Yun Tang
[1]
https://ci.apache.org/projec
Hi Padarn for what it's worth I am using DataDog metrics on EMR with Flink
1.7.1 and this here my flink-conf configuration:
- Classification: flink-conf
ConfigurationProperties:
metrics.reporter.dghttp.class:
org.apache.flink.metrics.datadog.DatadogHttpReporter
metrics.reporter.dghttp.ap
Hi everyone,
I've made a Beam pipeline that makes use of a SideInput which in my case is a
Map of key/values. I'm running Flink (1.7.1) on yarn (hadoop 2.6.0). I've found
that if my map is small enough everything works fine but if I make it large
enough (2-3MB) the pipeline fails with,
org.apa
Was any progress ever made on this? We have seen the same issue
in the past. What I do remember is, whatever I set max.block.ms
to, is when the job crashes.
I am going to attempt to reproduce the issue again and will report
back.
On 3/28/19 3:
Hi Flink users,
I am trying to figure out how leverage parallelism to improve throughput of
a Kafka consumer. From my research, I understand the scenario when *kafka
partitions (=<>) # consumer and * to use rebalance spread messages evenly
across workers.
Also use setParallelism(#) to achieve the