On Thu, Jan 9, 2020, at 16:34, Lucas Bradstreet wrote: > Hi Colin, > > This is a great idea, as it is very useful to have these metrics in > addition to the usual Kafka metrics given the impact of hitting disk > outside of page cache. Describing it as a gauge did initially strike me as > oldd, but given the way this is works it makes sense to me. > > /proc/[pid]/io appears to only be supported as of kernel 2.6.20. Given that > was released back in 2007, maybe it's safe enough to assume it exists, but > I thought I would mention that anyway.
Hi Lucas, Thanks for taking a look. Systems without /proc/[pid]/io will silently fall back to not having this metric. I think there should be very few Linux systems that don't have it, though, as you noted. > > Without bikeshedding the metric names, would including a "Total" in the > name be better e.g. kafka.server:type=KafkaServer,name=DiskReadBytesTotal? > That's a good idea. I changed the proposed names to TotalDiskReadBytes and TotalDiskWriteBytes. regards, Colin > Cheers, > > Lucas > > > On Mon, Jan 6, 2020 at 5:28 PM Colin McCabe <cmcc...@apache.org> wrote: > > > On Tue, Dec 10, 2019, at 11:10, Magnus Edenhill wrote: > > > Hi Colin, > > > > > > > Hi Magnus, > > > > Thanks for taking a look. > > > > > aren't those counters (ever increasing), rather than gauges > > (fluctuating)? > > > > Since this is in the Kafka broker, we're using Yammer. This might be > > confusing, but Yammer's concept of a "counter" is not actually monotonic. > > It can decrease as well as increase. > > > > In general Yammer counters require you to call inc(amount) or dec(amount) > > on them. This doesn't match up with what we need to do here, which is to > > (essentially) make a callback into the kernel by reading from /proc. > > > > The counter/gauge dichotomy doesn't affect the JMX, (I think?), so it's > > really kind of an implementation detail. > > > > > > > > You also mention CPU usage as a side note, you could use getrusage(2)'s > > > ru_utime (user) and ru_stime (sys) > > > to allow the broker to monitor its own CPU usage. > > > > > > > Interesting idea. It might be better to save that for a future KIP, > > though, to avoid scope creep. > > > > best, > > Colin > > > > > /Magnus > > > > > > Den tis 10 dec. 2019 kl 19:33 skrev Colin McCabe <cmcc...@apache.org>: > > > > > > > Hi all, > > > > > > > > I wrote KIP about adding support for exposing disk read and write > > > > metrics. Check it out here: > > > > > > > > https://cwiki.apache.org/confluence/x/sotSC > > > > > > > > best, > > > > Colin > > > > > > > > > >