Hi All, Recently we have been experimenting using Flinkās history server as a centralized debugging service for completed streaming jobs.
Specifically, we dynamically generate links to access log files on the YARN host; in the meantime, we use the Flink history server to show job graphs, exceptions and other info of the completed jobs[2]. This causes some pain for our users, namely: It is inconvenient to go to YARN host to access logs; then go to Flink history server for the other information. Thus we would like to propose an improvement to the currently Flink history server: - To support dynamic links to residual log files from the host machine within the retention period [3]; - To support dynamic links to aggregated log files provided by the cluster, if supported: such as Hadoop HistoryServer[1], or Kubernetes cluster level logging[4]? - Similar integration with Hadoop HistoryServer was already proposed before[5] with slightly different approach. Any feedback and suggestions are highly appreciated! -- Rong [1] https://hadoop.apache.org/docs/r2.9.2/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/HistoryServerRest.html [2] https://ci.apache.org/projects/flink/flink-docs-release-1.9/monitoring/historyserver.html [3] https://hadoop.apache.org/docs/r2.9.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml#yarn.nodemanager.log.retain-seconds [4] https://kubernetes.io/docs/concepts/cluster-administration/logging/#cluster-level-logging-architectures [5] https://issues.apache.org/jira/browse/FLINK-14317