Measuring cluster utilization of a streaming job

Nadeem Lalani Tue, 14 Nov 2017 04:56:04 -0800

Hi,

I was wondering if anyone has done some work around measuring the cluster
resource utilization of a "typical" spark streaming job.


We are trying to build a message ingestion system which will read from
Kafka and do some processing.  We have had some concerns raised in the team
that a 24*7 streaming job might not be the best use of cluster resources
especially when our use cases are to process data in a micro batch fashion
and are not truly streaming.

We wanted to measure  as to how much resource does a spark streaming
process take. Any pointers on where one would start?

We are on Yarn and plan to use spark 2.1

Thanks in advance,
Nadeem

Measuring cluster utilization of a streaming job

Reply via email to