Hi, I was wondering if anyone has done some work around measuring the cluster resource utilization of a "typical" spark streaming job.
We are trying to build a message ingestion system which will read from Kafka and do some processing. We have had some concerns raised in the team that a 24*7 streaming job might not be the best use of cluster resources especially when our use cases are to process data in a micro batch fashion and are not truly streaming. We wanted to measure as to how much resource does a spark streaming process take. Any pointers on where one would start? We are on Yarn and plan to use spark 2.1 Thanks in advance, Nadeem