Hi, I'm building a monitoring system for Apache Spark and want to set up default alerts (threshold or anomaly) on 2-3 key metrics everyone who uses Spark typically wants to alert on, but I don't yet have production-grade experience with Spark.
Importantly, alert rules have to be generally useful, so can't be on metrics whose values vary wildly based on the size of deployment. In other words, which metrics would be most significant indicators that something went wrong with your Spark: - master - worker - driver - executor - streaming I thought the best place to find experienced Spark users, who would find answering this question trivial, would be here. Thanks very much, Mark Scott