metadata.csv: that helps a lot, thank you! On Fri, Oct 5, 2018 at 5:42 AM Alain RODRIGUEZ <arodr...@gmail.com> wrote:
> I feel you for most of the troubles you faced, I've been facing most of > them too. Again, Datadog support can probably help you with most of those. > You should really consider sharing this feedback to them. > > there is re-namespacing of the metric names in lots of cases, and these >> don't appear to be centrally documented, but maybe i haven't found the >> magic page. >> > > I don't know if that would be the 'magic' page, but that's something: > https://github.com/DataDog/integrations-core/blob/master/cassandra/metadata.csv > > There are sooooo many good stats. > > > Yes, and it's still improving. I love this about Cassandra. It's our work > to pick the relevant ones for each situation. I would not like Cassandra to > reduce the number of metrics exposed, we need to learn to handle them > properly. Also, this is the reason we designed 4 dashboards out the box, > the goal was to have everything we need for distinct scenarios: > - Overview - global health-check / anomaly detection > - Read Path - troubleshooting / optimizing read ops > - Write Path - troubleshooting / optimizing write ops > - SSTable Management - troubleshooting / optimizing - > comapction/flushes/... anything related to sstables. > > instead of the single overview dashboard that was present before. We are > also perfectly aware that it's far from perfect, but aiming at perfect > would only have had us never releasing anything. Anyone interested could > now build missing dashboards or improve existing ones for himself or/and > suggest improvements to Datadog :). I hope I'll do some more of this work > at some point in the future. > > Good luck, > C*heers, > ----------------------- > Alain Rodriguez - @arodream - al...@thelastpickle.com > France / Spain > > The Last Pickle - Apache Cassandra Consulting > http://www.thelastpickle.com > > Le jeu. 4 oct. 2018 à 21:21, Carl Mueller > <carl.muel...@smartthings.com.invalid> a écrit : > >> for 2.1.x we had a custom reporter that delivered metrics to datadog's >> endpoint via https, bypassing the agent-imposed 350. But integrating that >> required targetting the other shared libs in the cassandra path, so the >> build is a bit of a pain when we update major versions. >> >> We are migrating our 2.1.x specific dashboards, and we will use >> agent-delivered metrics for non-table, and adapt the custom library to >> deliver the table-based ones, at a slower rate than the "core" ones. >> >> Datadog is also super annoying because there doesn't appear to be >> anything that reports what metrics the agent is sending (the metric count >> can indicate if a configured new metric increased the count and is being >> reported, but it's still... a guess), and there is re-namespacing of the >> metric names in lots of cases, and these don't appear to be centrally >> documented, but maybe i haven't found the magic page. >> >> There are sooooo many good stats. We might also implement some facility >> to dynamically turn on the delivery of detailed metrics on the nodes. >> >> On Tue, Oct 2, 2018 at 5:21 AM Alain RODRIGUEZ <arodr...@gmail.com> >> wrote: >> >>> Hello Carl, >>> >>> I guess we can use bean_regex to do specific targetted metrics for the >>>> important tables anyway. >>>> >>> >>> Yes, this would work, but 350 is very limited for Cassandra dashboards. >>> We have a LOT of metrics available. >>> >>> Datadog 350 metric limit is a PITA for tables once you get over 10 tables >>>> >>> >>> I noticed this while I was working on providing default dashboards for >>> Cassandra-Datadog integration. I was told by Datadog team it would not be >>> an issue for users, that I should not care about it. As you pointed out, >>> per table metrics quickly increase the total number of metrics we need to >>> collect. >>> >>> I believe you can set the following option: *"max_returned_metrics: >>> 1000"* - it can be used if metrics are missing to increase the limit of >>> the number of collected metrics. Be aware of CPU utilization that this >>> might imply (greatly improved in dd-agent version 6+ I believe -thanks >>> Datadog teams for that- making this fully usable for Cassandra). This >>> option should go in the *cassandra.yaml* file for Cassandra >>> integrations, off the top of my head. >>> >>> Also, do not hesitate to reach to Datadog directly for this kind of >>> questions, I have always been very happy with their support so far, I am >>> sure they would guide you through this as well, probably better than we can >>> do :). It also provides them with feedback on what people are struggling >>> with I imagine. >>> >>> I am interested to know if you still have issues getting more metrics >>> (option above not working / CPU under too much load) as this would make the >>> dashboards we built mostly unusable for clusters with more tables. We might >>> then need to review the design. >>> >>> As a side note, I believe metrics are handled the same way cross >>> version, they got the same name/label for C*2.1, 2.2 and 3+ on Datadog. >>> There is an abstraction layer that removes this complexity (if I remember >>> well, we built those dashboards a while ago). >>> >>> C*heers >>> ----------------------- >>> Alain Rodriguez - @arodream - al...@thelastpickle.com >>> France / Spain >>> >>> The Last Pickle - Apache Cassandra Consulting >>> http://www.thelastpickle.com >>> >>> Le lun. 1 oct. 2018 à 19:38, Carl Mueller >>> <carl.muel...@smartthings.com.invalid> a écrit : >>> >>>> That's great too, thank you. >>>> >>>> Datadog 350 metric limit is a PITA for tables once you get over 10 >>>> tables, but I guess we can use bean_regex to do specific targetted metrics >>>> for the important tables anyway. >>>> >>>> On Mon, Oct 1, 2018 at 4:21 AM Alain RODRIGUEZ <arodr...@gmail.com> >>>> wrote: >>>> >>>>> Hello Carl, >>>>> >>>>> Here is a message I sent to my team a few months ago. I hope this will >>>>> be helpful to you and more people around :). It might not be exhaustive >>>>> and >>>>> we were moving from C*2.1 to C*3+ in this case, thus skipping C*2.2, but >>>>> C*2.2 is similar to C*3.0 if I remember correctly in terms of metrics. >>>>> Here >>>>> it is for what it's worth: >>>>> >>>>> Quite a few things changed between metric reporter in C* 2.1 and C*3.0. >>>>> - ColumnFamily --> Table >>>>> - XXpercentile --> pXX >>>>> - 1MinuteRate --> m1_rate >>>>> - metric name before KS and Table names and some other changes of this >>>>> kind. >>>>> - ^ aggregations / aliases indexes changed because of this (using >>>>> graphite for example) ^ >>>>> - ‘.value’ is not appended in the metric name anymore for gauges, >>>>> nothing instead. >>>>> >>>>> For example (graphite): >>>>> >>>>> From >>>>> aliasByNode(averageSeriesWithWildcards(cassandra.$env.$dc.$host.org.apache.cassandra.metrics.ColumnFamily.$ks.$table.ReadLatency.95percentile, >>>>> 2, 3), 1, 7, 8, 9) >>>>> >>>>> to >>>>> aliasByNode(averageSeriesWithWildcards(cassandra.$env.$dc.$host.org.apache.cassandra.metrics.Table.ReadLatency.$ks.$table.p95, >>>>> 2, 3), 1, 8, 9, 10) >>>>> >>>>> C*heers, >>>>> ----------------------- >>>>> Alain Rodriguez - @arodream - al...@thelastpickle.com >>>>> France / Spain >>>>> >>>>> The Last Pickle - Apache Cassandra Consulting >>>>> http://www.thelastpickle.com >>>>> >>>>> Le ven. 28 sept. 2018 à 20:38, Carl Mueller >>>>> <carl.muel...@smartthings.com.invalid> a écrit : >>>>> >>>>>> VERY NICE! Thank you very much >>>>>> >>>>>> On Fri, Sep 28, 2018 at 1:32 PM Lyuben Todorov < >>>>>> lyuben.todo...@instaclustr.com> wrote: >>>>>> >>>>>>> Nothing as fancy as a matrix but a list of what JMX term can see. >>>>>>> Link to the online diff here: https://www.diffchecker.com/G9FE9swS >>>>>>> >>>>>>> /lyubent >>>>>>> >>>>>>> On Fri, 28 Sep 2018 at 19:04, Carl Mueller >>>>>>> <carl.muel...@smartthings.com.invalid> wrote: >>>>>>> >>>>>>>> It's my understanding that metrics got heavily re-namespaced in JMX >>>>>>>> for 2.2 from 2.1 >>>>>>>> >>>>>>>> Did anyone ever make a migration matrix/guide for conversion of old >>>>>>>> metrics to new metrics? >>>>>>>> >>>>>>>> >>>>>>>>