Thanks Stefan for reviewing this, please find my comments inline:
>We already provide tons of metrics and provide some useful logging (e.g. when reading too many tombstones), but I think we should still be able to implement further >checks in-code that highlight potentially issues. Maybe we could >really use a framework for that, I don't know. I agree, Cassandra already has details coming out as part of metrics, logging (like tombstones), etc. Current log messages for (tombstone messages, large partition message, slow query messages, etc.) are very useful, but one important aspect missing here is, all of these are trying to solve same problem but they are implemented on their own (at different times) and as a result it has duplicate code and lacks important things like changing threshold w/o restart, commonality among log messages, have different interface so that users can consume differently, etc. If we look at this new effort then it is just making them common so we have a common way of doing the things in Cassandra with more features like change threshold at runtime, commonality in log messages, user can consume differently, etc. >If you followed the discussions a while ago, we also talked about moving some of the code out of Cassandra into side-car processes. Although this will likely not >manifest for 4.0, most of the devs seem to be fond of the idea and so am I. I agree that side-car is very useful project but in my opinion it will be difficult to get internal details out in realtime without modifying Cassandra. >Not wanting to derail this discussion (about your proposed solution), but let me just briefly mention that I've been working on some related approach (diagnostic events, >CASSANDRA-12944), which would allow to expose internal events to external processes that would be able to analyze these events, alert users, or event act on them. >It's a different approach from what you're suggesting, but just wanted to mention this and maybe you'd agree that having external processes for monitoring Cassandra >has some advantages. Thanks for sharing this, this is really useful feature and will make operational aspect even more easy. If we look at my proposed then it is just picking low hanging fruit, in other words it is just rearchitecting existing logs messages like (tombstone messages, large partition message, slow query messages, etc.) and adding few more in generic way with more features like (one can threshold at runtime, commonality in log messages, user can consume differently, etc.). Idea here is we make it a framework to report these type of messages so that all the messages (existing + new ones) will have similarity among them. On Wed, Jun 20, 2018 at 1:35 AM Stefan Podkowinski <s...@apache.org> wrote: > Jaydeep, thanks for taking this discussion to the dev list. I think it's > the best place to introduce new idea, discuss them in general and how > they potentially fit in. As already mention in the ticket, I do share > your assessment that we should try to improve making operational issue > more visible to users. We already provide tons of metrics and provide > some useful logging (e.g. when reading too many tombstones), but I think > we should still be able to implement further checks in-code that > highlight potentially issues. Maybe we could really use a framework for > that, I don't know. > > If you followed the discussions a while ago, we also talked about moving > some of the code out of Cassandra into side-car processes. Although this > will likely not manifest for 4.0, most of the devs seem to be fond of > the idea and so am I. Not wanting to derail this discussion (about your > proposed solution), but let me just briefly mention that I've been > working on some related approach (diagnostic events, CASSANDRA-12944), > which would allow to expose internal events to external processes that > would be able to analyze these events, alert users, or event act on > them. It's a different approach from what you're suggesting, but just > wanted to mention this and maybe you'd agree that having external > processes for monitoring Cassandra has some advantages. > > > > On 20.06.2018 06:33, Jaydeep Chovatia wrote: > > Hi, > > > > We have worked on developing some common framework to detect/log > > anti-patterns/bad queries in Cassandra. Target for this effort would be > > to reduce burden on ops to handle Cassandra at large scale, as well as > > help beginners to quickly identify performance problems with the > Cassandra. > > Initially we wanted to try out to make sure it really works and provides > > value. we've opened JIRA with all the details. Would you please review > and > > provide your feedback on this effort? > > https://issues.apache.org/jira/browse/CASSANDRA-14527 > > > > > > Thank You!!! > > > > > > Jaydeep > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >