ctubbsii commented on issue #4973:
URL: https://github.com/apache/accumulo/issues/4973#issuecomment-2418500452

   I would suggest getting rid of the log aggregation on the monitor as well. 
It is quite a pain to do it properly and without killing the tservers, the 
network, or the monitor. Logs don't survive a restart, and are dropped if there 
are too many. And, any solution we come up with is going to be much more 
complex and less useful than a simple rsyslog setup, or a small scheduled rsync 
script. For large installations, it's worth setting up something suitable for 
log aggregation and analysis. For small installations, you can just ssh to a 
tserver and cat/grep/less the logs (what we do in development). Recently, we 
found a problem with too many TCP connections with our attempted fix for 
ensuring logging was async to the monitor in #4879. The proposed solutions 
aren't great.
   
   Things I'd want to keep are things that give you a big picture view of a 
deployed cluster:
   
   1. List of namespaces, list of tables in each namespace
   2. A page or view for each table (list of tablets, whether or not they are 
hosted)
   3. List of servers, organized in resource groups and by server type
   4. Overall health status for servers (some kind of obvious visual signal to 
indicate "healthy", "needs attention", "out of service", or similar)
   5. Basic activity report for each server, depending on the server type (gc 
should report on when the last garbage collection ran, when the next is 
expected to run, for example; the manager should report on its core 
responsibilities, like running fate operations and balancing; compactors should 
report on whether they are compacting; tservers and sservers should report what 
tablets they are hosting/scanning and client scan sessions, etc.)
   
   These shouldn't duplicate merely reporting detailed metrics that can be 
obtained directly, but they could utilize some of the metrics to provide a more 
meaningful view of the status of the system or a particular component's health 
and status.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to