Hi,

I was wondering if anyone has done some work around measuring the cluster
resource utilization of a "typical" spark streaming job.

We are trying to build a message ingestion system which will read from
Kafka and do some processing.  We have had some concerns raised in the team
that a 24*7 streaming job might not be the best use of cluster resources
especially when our use cases are to process data in a micro batch fashion
and are not truly streaming.

We wanted to measure  as to how much resource does a spark streaming
process take. Any pointers on where one would start?

We are on Yarn and plan to use spark 2.1

Thanks in advance,
Nadeem

Reply via email to