On Thu, Jan 9, 2020, at 16:34, Lucas Bradstreet wrote:
> Hi Colin,
> 
> This is a great idea, as it is very useful to have these metrics in
> addition to the usual Kafka metrics given the impact of hitting disk
> outside of page cache. Describing it as a gauge did initially strike me as
> oldd, but given the way this is works it makes sense to me.
> 
> /proc/[pid]/io appears to only be supported as of kernel 2.6.20. Given that
> was released back in 2007, maybe it's safe enough to assume it exists, but
> I thought I would mention that anyway.

Hi Lucas,

Thanks for taking a look.

Systems without /proc/[pid]/io will silently fall back to not having this 
metric.  I think there should be very few Linux systems that don't have it, 
though, as you noted.

> 
> Without bikeshedding the metric names, would including a "Total" in the
> name be better e.g. kafka.server:type=KafkaServer,name=DiskReadBytesTotal?
> 

That's a good idea.  I changed the proposed names to TotalDiskReadBytes and 
TotalDiskWriteBytes.

regards,
Colin

> Cheers,
> 
> Lucas
> 
> 
> On Mon, Jan 6, 2020 at 5:28 PM Colin McCabe <cmcc...@apache.org> wrote:
> 
> > On Tue, Dec 10, 2019, at 11:10, Magnus Edenhill wrote:
> > > Hi Colin,
> > >
> >
> > Hi Magnus,
> >
> > Thanks for taking a look.
> >
> > > aren't those counters (ever increasing), rather than gauges
> > (fluctuating)?
> >
> > Since this is in the Kafka broker, we're using Yammer.  This might be
> > confusing, but Yammer's concept of a "counter" is not actually monotonic.
> > It can decrease as well as increase.
> >
> > In general Yammer counters require you to call inc(amount) or dec(amount)
> > on them.  This doesn't match up with what we need to do here, which is to
> > (essentially) make a callback into the kernel by reading from /proc.
> >
> > The counter/gauge dichotomy doesn't affect the JMX, (I think?), so it's
> > really kind of an implementation detail.
> >
> > >
> > > You also mention CPU usage as a side note, you could use getrusage(2)'s
> > > ru_utime (user) and ru_stime (sys)
> > > to allow the broker to monitor its own CPU usage.
> > >
> >
> > Interesting idea.  It might be better to save that for a future KIP,
> > though, to avoid scope creep.
> >
> > best,
> > Colin
> >
> > > /Magnus
> > >
> > > Den tis 10 dec. 2019 kl 19:33 skrev Colin McCabe <cmcc...@apache.org>:
> > >
> > > > Hi all,
> > > >
> > > > I wrote KIP about adding support for exposing disk read and write
> > > > metrics.  Check it out here:
> > > >
> > > > https://cwiki.apache.org/confluence/x/sotSC
> > > >
> > > > best,
> > > > Colin
> > > >
> > >
> >
>

Reply via email to