I have not tried to do this, but as we (Cloudera) deal with more and more
performance related problems, I feel something like this is needed.

It is a tricky problem due to the number of requests the NN handles and how
performance sensitive it is.

At the IPC Server level, we should be able to know the request queue time,
processing time, response queue time and the type of request.

If we sampled X% of requests and then emitted one log line per interval (eg
per minute), we could perhaps build a histogram of queue size, queue times,
processing times per request type.

>From JMX, we can get the request counts and queue length, but I am not sure
if we can get something like percentiles or queue time and processing time
over the previous minute for example?

Even given the above details, if we see a long queue length, it may still
remain a mystery about what was causing that queue. Often it is due to a
long running request (eg contentSummay, snapshotdiff etc) holding the NN
lock in write mode for too long.

What would be very useful, is a way to see the percentage of time the NN
lock is held in Exclusive mode (write), shared mode (read) or not held at
all (rare on a busy cluster). Even better if we can somehow bubble up the
top requests holding the lock in exclusive mode.

Perhaps sampling the time spent waiting to acquire the lock could be useful
too.

I also think it would be useful to expose response times from the client
perspective.

https://issues.apache.org/jira/browse/HDFS-14084 seemed interesting and
could be worth finishing.

I also found https://issues.apache.org/jira/browse/HDFS-12861 some time
back to get the client to log data read speeds.

Have you made any attempts in this area so far, and did you have any
success?

Thanks,

Stephen.

On Thu, Mar 18, 2021 at 5:41 AM Fengnan Li <loyal...@gmail.com> wrote:

> Hi community,
>
>
>
> Has someone ever tried to implement sampling logging for ipc Server? We
> would like to gain more observability for all of the traffics to the
> Namenode.
>
>
>
> Thanks,
> Fengnan
>
>

Reply via email to