Anyone who has made metrics integration to external systems for flink running on AWS EMR, can you share if its a configuration issue or EMR specific issue.
Thanks, Hemant On Wed, Aug 12, 2020 at 9:55 PM bat man <tintin0...@gmail.com> wrote: > An update in the yarn logs I could see the below - > > Classpath: > *lib/flink-metrics-influxdb-1.9.0.jar:lib/flink-shaded-hadoop-2-uber-2.8.5-amzn-5-7.0.jar:lib/flink-table-blink_2.11-1.9.0.jar:lib/flink-table_2.11-1.9.0.jar:lib/log4j-1.2.17.jar:lib/slf4j-log4j12-1.7.15.jar:log4j.properties:plugins/influxdb/flink-metrics-influxdb-1.9.0.jar....* > *..........* > *......* > > This means the jar is getting loaded, in the logs I could also see - > 2020-08-12 15:28:51,505 INFO org.apache.flink.yarn.YarnTaskExecutorRunner > - Registered UNIX signal handlers for [TERM, HUP, I > NT] > 2020-08-12 15:28:51,508 INFO org.apache.flink.yarn.YarnTaskExecutorRunner > - Current working Directory: /mnt/yarn/usercache/ha > > doop/appcache/application_1595767096609_0013/container_1595767096609_0013_01_000004 > > *2020-08-12 15:28:51,512 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: metrics.reporter.influxdb.interval, 60 SECONDS* > > *2020-08-12 15:28:51,512 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: env.yarn.conf.dir, /etc/hadoop/conf* > 2020-08-12 15:28:51,513 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: metrics.reporter. > influxdb.host, xx.xxx.xxx.xx > 2020-08-12 15:28:51,513 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: high-availability > .cluster-id, application_1595767096609_0013 > 2020-08-12 15:28:51,513 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: jobmanager.rpc.ad > dress, ip-xx-x-xx-xxx.ec2.internal > 2020-08-12 15:28:51,513 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: metrics.reporter. > influxdb.password, ****** > > *2020-08-12 15:28:51,513 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: FLINK_PLUGINS_DIR, /usr/lib/flink/plugins* > 2020-08-12 15:28:51,513 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: metrics.reporter. > influxdb.db, xxxxxx > 2020-08-12 15:28:51,520 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: metrics.reporter. > influxdb.connectTimeout, 60000 > 2020-08-12 15:28:51,520 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: env.hadoop.conf.d > ir, /etc/hadoop/conf > 2020-08-12 15:28:51,521 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: taskmanager.numbe > rOfTaskSlots, 1 > 2020-08-12 15:28:51,521 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: web.port, 0 > 2020-08-12 15:28:51,521 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: metrics.reporter.influxdb.username, xxxx > 2020-08-12 15:28:51,521 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: taskmanager.memory.size, 264241152b > 2020-08-12 15:28:51,521 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: web.tmpdir, > /tmp/flink-web-5562f065-6020-4c38-8260-3aea434bf285 > 2020-08-12 15:28:51,521 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: jobmanager.rpc.port, 32777 > 2020-08-12 15:28:51,521 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: metrics.reporter.influxdb.port, 8086 > 2020-08-12 15:28:51,521 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: metrics.reporter.influxdb.retentionPolicy, one_hour > 2020-08-12 15:28:51,522 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: internal.cluster.execution-mode, NORMAL > 2020-08-12 15:28:51,522 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: metrics.reporter.influxdb.writeTimeout, 60000 > 2020-08-12 15:28:51,522 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: metrics.reporter.influxdb.consistency, ONE > 2020-08-12 15:28:51,522 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: rest.address, ip-xx-x-xx-xxx.ec2.internal > *2020-08-12 15:28:51,522 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: metrics.reporter.influxdb.factory.class, > org.apache.flink.metrics.influxdb.InfluxdbReporterFactory* > ....... > but then below I could see - > > *2020-08-12 15:28:51,523 WARN org.apache.flink.core.plugin.PluginConfig > - Environment variable [FLINK_PLUGINS_DIR] is set to > [/usr/lib/flink/plugins] but the directory doesn't exist* > 2020-08-12 15:28:51,561 INFO org.apache.flink.yarn.YarnTaskExecutorRunner > - Current working/local Directory: > /mnt/yarn/usercache/hadoop/appcache/application_1595767096609_0013,/mnt1/yarn/usercache/hadoop/appcache/application_1595767096609_0013 > 2020-08-12 15:28:51,564 INFO > org.apache.flink.runtime.clusterframework.BootstrapTools - Setting > directories for temporary files to: > /mnt/yarn/usercache/hadoop/appcache/application_1595767096609_0013,/mnt1/yarn/usercache/hadoop/appcache/application_1595767096609_0013 > 2020-08-12 15:28:51,564 INFO org.apache.flink.yarn.YarnTaskExecutorRunner > - TM: remote keytab path obtained null > 2020-08-12 15:28:51,564 INFO org.apache.flink.yarn.YarnTaskExecutorRunner > - TM: remote keytab principal obtained null > 2020-08-12 15:28:51,566 INFO org.apache.flink.yarn.YarnTaskExecutorRunner > - YARN daemon is running as: hadoop Yarn client user > obtainer: hadoop > 2020-08-12 15:28:51,675 INFO > org.apache.flink.runtime.security.modules.HadoopModule - Hadoop > user set to hadoop (auth:xxxxxx) > 2020-08-12 15:28:51,984 WARN org.apache.flink.configuration.Configuration > - Config uses deprecated configuration key 'web.port' > instead of proper key 'rest.port' > 2020-08-12 15:28:51,987 INFO > org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Using > configured hostname/address for TaskManager: ip-xx-x-xx-xxx.ec2.internal. > 2020-08-12 15:28:51,996 INFO > org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils - Trying to > start actor system at ip-xx-x-xx-xxx.ec2.internal:0 > 2020-08-12 15:28:52,823 INFO akka.event.slf4j.Slf4jLogger > - Slf4jLogger started > 2020-08-12 15:28:52,854 INFO akka.remote.Remoting > - Starting remoting > 2020-08-12 15:28:53,061 INFO akka.remote.Remoting > - Remoting started; listening on addresses > :[akka.tcp://flink@ip-xx-x-xx-xxx.ec2.internal:37937] > 2020-08-12 15:28:53,563 INFO > org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils - Actor > system started at akka.tcp://flink@iip-xx-x-xx-xxx.ec2.ec2.internal:37937 > *2020-08-12 15:28:53,593 WARN > org.apache.flink.runtime.metrics.ReporterSetup - The > reporter factory > (org.apache.flink.metrics.influxdb.InfluxdbReporterFactory) could not be > found for reporter influxdb. Available factories:* > > *2020-08-12 15:28:53,597 INFO > org.apache.flink.runtime.metrics.MetricRegistryImpl - No metrics > reporter configured, no metrics will be exposed/reported.*2020-08-12 > 15:28:53,599 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils > - Trying to start actor system at ip-xx-x-xx-xxx.ec2.ec2.internal:0 > > So at one place org.apache.flink.configuration.GlobalConfiguration refers > to the properties and metrics reported but > then org.apache.flink.runtime.metrics.ReporterSetup complains of not > finding it. > > Can anyone guide what I am missing here. > > Thanks, > Hemant > > On Wed, Aug 12, 2020 at 9:15 PM bat man <tintin0...@gmail.com> wrote: > >> Hello Experts, >> >> I am running Flink - 1.9.0 on AWS EMR(emr-5.28.1). I want to push >> metrics to Influxdb. I followed the documentation[1]. I added the >> configuration to /usr/lib/flink/conf/flink-conf.yaml and copied the jar to >> /usr/lib/flink//lib folder on master node. However, I also >> understand that the cluster might need a re-start as only with these steps >> when I run the job I don't see any measurement(table) created in my influx >> db. I am not able to find any documentation on how to restart the cluster >> on EMR. >> Anyone who has configured to push metrics to InfluxDB from AWS EMR could >> you share the steps please. >> >> [1] >> https://ci.apache.org/projects/flink/flink-docs-release-1.9/monitoring/metrics.html#influxdb-orgapacheflinkmetricsinfluxdbinfluxdbreporter >> >> Thanks, >> Hemant >> >