Re: [DISCUSSION] Framework for Internal Collection Exposure and Monitoring API Alignment

Maxim Muzafarov Mon, 12 Feb 2024 04:29:59 -0800

Hello everyone,


We still need a few eyes to help push the changes forward.
https://issues.apache.org/jira/browse/CASSANDRA-14572

Here's the post I prepared as a result of working on this issue (it
might help to review it):
https://dzone.com/articles/making-dropwizard-metrics-accessible-via-cql-in-ap

On Fri, 22 Dec 2023 at 12:38, Maxim Muzafarov <mmu...@apache.org> wrote:
>
> Hello everyone and happy holidays,
>
> The changes below are ready for review!
> Benchmarks are also inside.
>
> Expose all table metrics in virtual tables
> https://issues.apache.org/jira/browse/CASSANDRA-14572
> https://github.com/apache/cassandra/pull/2958/files
>
> On Tue, 12 Dec 2023 at 22:05, Maxim Muzafarov <mmu...@apache.org> wrote:
> >
> > Hello everyone,
> >
> >
> > I still think Cassandra will benefit from having this idea implemented
> > and used through the source code, so I've done another round of
> > rethinking this concept and it seems I've found a solution. As a
> > result, we can significantly reduce the cost of implementing and
> > maintaining both new and existing virtual tables and make our users
> > happier by seeing everything they need through virtual tables.
> >
> > So, I think we should limit the scope of the original proposal to the 
> > following:
> > ## A framework for exposing any internal data collection to virtual
> > tables ONLY. ##
> >
> > As a proof of concept, I took the CASSANDRA-14572 "Expose all table
> > metrics in virtual table" JIRA ticket, which provides a good
> > opportunity to demonstrate how to export all metrics to VTs at once
> > without having boilerplate implementations. Currently, we already have
> > CQLMetricsTable, BatchMetricsTable, etc. that expose metrics to VTs in
> > a pretty similar way, and most of the metrics groups are located under
> > the org.apache.cassandra.metrics package still lacks their
> > representation as VTs either. I've used the MetricRegistry collection
> > as a view of registered metrics to export them to VT using the
> > prototype accordingly.
> >
> > The prototype is complete. You can run a node locally and check the
> > available virtual tables with cqlsh, or you can check the changes
> > using the following link to the PR:
> > https://github.com/apache/cassandra/pull/2958/files
> >
> >
> > Below are some key points about the design itself:
> >
> > 1. All new virtual tables with metrics have "metric" as a prefix so
> > that they are fairly easy to find using TAB on the cqlsh command line.
> > Metrics are split into virtual tables as they are listed in the
> > org.apache.cassandra.metrics e.g. metrics_cql, metrics_tcm etc. In
> > addition, they are also grouped by metric type e.g.
> > metric_type_histogram, metric_type_counter etc. There is a table
> > called "metric_all_metric_groups" with all available metric groups.
> >
> > 2. To create a new virtual table representation of an internal
> > collection a developer needs to do two things: create a virtual table
> > row representation, and register it using
> > CollectionVirtualTableAdapter, which acts as an adapter between
> > internal data and a virtual table. Here's how I did it for the thread
> > pools VT, this is a fully backward compatible change:
> > https://github.com/apache/cassandra/pull/2958/files#diff-5fda13a633723cdf232bba465e6fb7ab74cdc02f7ba55dae4d1cf494ffb71abaR61
> >
> > 3. The "metrics_keyspace" virtual table ended up being quite large
> > since it contains all the metrics for all available keyspaces on a
> > local node, so the default implementation provided by
> > AbstractVirtualTable is not suitable for the proposal. The
> > AbstractVirtualTable materializes a full data collection on the heap
> > using SimpleDataSet, regardless of the portion of data that is being
> > queried. In this case, we have to use an iterative approach, as the
> > CollectionVirtualTableAdapter does (the problem was discussed in
> > CASSANDRA-14629 and is now a part of the solution). This also helps to
> > keep the memory footprint low.
> >
> > 4. Another valuable change is the CassandraMetricsRegistry itself. The
> > problem here is that the metrics and their aliases are currently
> > exported to JMX, but the implemented virtual tables export the metrics
> > in their way and most of the cases don't respect the metric aliases
> > which are registered in the MetricsRegistry. This should be fixed as a
> > part of the CASSANDRA-14572 to avoid ambiguity for all known metrics
> > once and for all.
> >
> > Here are the links to the issue and the PR:
> > https://issues.apache.org/jira/browse/CASSANDRA-14572
> > https://github.com/apache/cassandra/pull/2958/files
> >
> >
> > I'm excited about how these changes look right now, so please share
> > your feedback and thoughts.
> > The PR lacks good test coverage, I'll fix it as soon as we have a
> > clear vision of the design (or much sooner) :-)
> >
> > On Mon, 30 Jan 2023 at 17:43, David Capwell <dcapw...@apple.com> wrote:
> > >
> > > I *think* this is arguably true for a vtable / CQL-based solution as well 
> > > from the "you don't know how people are using your API" perspective.
> > >
> > >
> > > Very fair point and think that justifies a different thread to talk about 
> > > backwards compatibility for our tables (virtual and not); maybe we can 
> > > lump together the JMX backwards compatibility conversation as well in 
> > > that new thread.
> > >
> > > On Jan 28, 2023, at 4:09 PM, Josh McKenzie <jmcken...@apache.org> wrote:
> > >
> > > First off - thanks so much for putting in this effort Maxim! This is 
> > > excellent work.
> > >
> > > Some thoughts on the CEP and responses in thread:
> > >
> > > Considering that JMX is usually not used and disabled in production 
> > > environments for various performance and security reasons, the operator 
> > > may not see the same picture from various of Dropwizard's metrics 
> > > exporters and integrations as Cassandra's JMX metrics provide [1][2].
> > >
> > > I don't think this assertion is true. Cassandra is running in a lot of 
> > > places in the world, and JMX has been in this ecosystem for a long time; 
> > > we need data that is basically impossible to get to claim "JMX is usually 
> > > not used in C* environments in prod".
> > >
> > > I also wonder about if we should care about JMX?  I know many wish to 
> > > migrate (its going to be a very long time) away from JMX, so do we need a 
> > > wrapper to make JMX and vtables consistent?
> > >
> > > If we can move away from a bespoke vtable or JMX based implementation and 
> > > instead have a templatized solution each of these is generated from, that 
> > > to me is the superior option. There's little harm in adding new JMX 
> > > endpoints (or hell, other metrics framework integration?) as a byproduct 
> > > of adding new vtable exposed metrics; we have the same maintenance 
> > > obligation to them as we have to the vtables and if it generates from the 
> > > same base data, we shouldn't have any further maintenance burden due to 
> > > its presence right?
> > >
> > > we wish to move away from JMX
> > >
> > > I do, and you do, and many people do, but I don't believe all people on 
> > > the project do. The last time this came up in slack the conclusion was 
> > > "Josh should go draft a CEP to chart out a path to moving off JMX while 
> > > maintaining backwards-compat w/existing JMX metrics for environments that 
> > > are using them" (so I'm excited to see this CEP pop up before I got to 
> > > it! ;)). Moving to a system that gives us a 0-cost way to keep JMX and 
> > > vtable in sync over time on new metrics seems like a nice compromise for 
> > > folks that have built out JMX-based maintenance infra on top of C*. Plus 
> > > removing the boilerplate toil on vtables. win-win.
> > >
> > > If we add a column to the end of the JMX row did we just break users?
> > >
> > > I *think* this is arguably true for a vtable / CQL-based solution as well 
> > > from the "you don't know how people are using your API" perspective. 
> > > Unless we have clear guidelines about discretely selecting the columns 
> > > you want from a vtable and trust users to follow them, if people have 
> > > brittle greedy parsers pulling in all data from vtables we could very 
> > > well break them as well by adding a new column right? Could be wrong 
> > > here; I haven't written anything that consumes vtable metric data and 
> > > maybe the obvious idiom in the face of that is robust in the presence of 
> > > column addition. /shrug
> > >
> > > It's certainly more flexible and simpler to write to w/out detonating 
> > > compared to JMX, but it's still an API we'd be revving.
> > >
> > > On Sat, Jan 28, 2023, at 4:24 PM, Ekaterina Dimitrova wrote:
> > >
> > > Overall I have similar thoughts and questions as David.
> > >
> > > I just wanted to add a reminder about this thread from last summer[1]. We 
> > > already have issues with the alignment of JMX and Settings Virtual Table. 
> > > I guess this is how Maxim got inspired to suggest this framework proposal 
> > > which I want to thank him for! (I noticed he assigned CASSANDRA-15254)
> > >
> > > Not to open the Pandora box, but to me the most important thing here is 
> > > to come into agreement about the future of JMX and what we will do or not 
> > > as a community. Also, how much time people are able to invest. I guess 
> > > this will influence any directions to be taken here.
> > >
> > > [1]
> > > https://lists.apache.org/thread/8mjcwdyqoobpvw2262bqmskkhs76pp69
> > >
> > >
> > > On Thu, 26 Jan 2023 at 12:41, David Capwell <dcapw...@apple.com> wrote:
> > >
> > > I took a look and I see the result is an interface that looks like the 
> > > vtable interface, that is then used by vtables and JMX?  My first thought 
> > > is why not just use the vtable logic?
> > >
> > > I also wonder about if we should care about JMX?  I know many wish to 
> > > migrate (its going to be a very long time) away from JMX, so do we need a 
> > > wrapper to make JMX and vtables consistent?  I am cool with something 
> > > like the following
> > >
> > > registerWithJMX(jmxName, query(“SELECT * FROM system_views.streaming”));
> > >
> > >
> > > So if we want to have a JMX view that matches the table then that’s cool 
> > > by me, but one thing that has been brought up in reviews is backwards 
> > > compatibility with regard to adding columns… If we add a column to the 
> > > end of the JMX row did we just break users?
> > >
> > > Considering that JMX is usually not used and disabled in production 
> > > environments for various performance and security reasons, the operator 
> > > may not see the same picture from various of Dropwizard's metrics 
> > > exporters
> > >
> > > If this is a real problem people are hitting, we can always add the 
> > > ability to push metrics to common systems with a pluggable way to add 
> > > non-standard solutions.  Dropwizard already support this so would be low 
> > > hanging fruit to address this.
> > >
> > > To make the proposed changes backwards compatible with the previous 
> > > version of Cassandra, all MBeans and Virtual Tables we already have will 
> > > remain unchanged
> > >
> > >
> > > If this is for new JMX endpoints moving forward, I am not sure of the 
> > > benefit for the same reason listed above; we wish to move away from JMX
> > >
> > > On Jan 25, 2023, at 10:51 AM, Maxim Muzafarov <mmu...@apache.org> wrote:
> > >
> > > Hello Cassandra Community,
> > >
> > >
> > > I've been faced with a number of inconsistencies in the user APIs of
> > > the internal data collections representation exposed through the
> > > Cassandra monitoring interfaces that need to be fully aligned from an
> > > operator perspective. First of all, I'm highlighting JMX, Dropwizard
> > > Metrics, and Virtual Tables user interfaces. In order to address all
> > > these inconsistencies, I have created a draft enhancement proposal
> > > that describes everything I have found and how we can fix it once and
> > > for all.
> > >
> > > I'd like to hear your opinion and thoughts on it. Please take a look:
> > > https://docs.google.com/document/d/1j4J3bPWjQkAU9x4G-zxKObxPrKg36jLRT6xpUoNJa8Q
> > >
> > >
> > > --
> > > Maxim Muzafarov
> > >
> > >

Re: [DISCUSSION] Framework for Internal Collection Exposure and Monitoring API Alignment

Reply via email to