Re: Metrics matrix: migrate 2.1.x metrics to 2.2.x+

Carl Mueller Tue, 16 Oct 2018 11:46:36 -0700

metadata.csv: that helps a lot, thank you!

On Fri, Oct 5, 2018 at 5:42 AM Alain RODRIGUEZ <arodr...@gmail.com> wrote:


> I feel you for most of the troubles you faced, I've been facing most of
> them too. Again, Datadog support can probably help you with most of those.
> You should really consider sharing this feedback to them.
>
> there is re-namespacing of the metric names in lots of cases, and these
>> don't appear to be centrally documented, but maybe i haven't found the
>> magic page.
>>
>
> I don't know if that would be the 'magic' page, but that's something:
> https://github.com/DataDog/integrations-core/blob/master/cassandra/metadata.csv
>
> There are sooooo many good stats.
>
>
> Yes, and it's still improving. I love this about Cassandra. It's our work
> to pick the relevant ones for each situation. I would not like Cassandra to
> reduce the number of metrics exposed, we need to learn to handle them
> properly. Also, this is the reason we designed 4 dashboards out the box,
> the goal was to have everything we need for distinct scenarios:
> - Overview - global health-check / anomaly detection
> - Read Path - troubleshooting / optimizing read ops
> - Write Path - troubleshooting / optimizing write ops
> - SSTable Management - troubleshooting / optimizing -
> comapction/flushes/... anything related to sstables.
>
> instead of the single overview dashboard that was present before. We are
> also perfectly aware that it's far from perfect, but aiming at perfect
> would only have had us never releasing anything. Anyone interested could
> now build missing dashboards or improve existing ones for himself or/and
> suggest improvements to Datadog :). I hope I'll do some more of this work
> at some point in the future.
>
> Good luck,
> C*heers,
> -----------------------
> Alain Rodriguez - @arodream - al...@thelastpickle.com
> France / Spain
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> Le jeu. 4 oct. 2018 à 21:21, Carl Mueller
> <carl.muel...@smartthings.com.invalid> a écrit :
>
>> for 2.1.x we had a custom reporter that delivered  metrics to datadog's
>> endpoint via https, bypassing the agent-imposed 350. But integrating that
>> required targetting the other shared libs in the cassandra path, so the
>> build is a bit of a pain when we update major versions.
>>
>> We are migrating our 2.1.x specific dashboards, and we will use
>> agent-delivered metrics for non-table, and adapt the custom library to
>> deliver the table-based ones, at a slower rate than the "core" ones.
>>
>> Datadog is also super annoying because there doesn't appear to be
>> anything that reports what metrics the agent is sending (the metric count
>> can indicate if a configured new metric increased the count and is being
>> reported, but it's still... a guess), and there is re-namespacing of the
>> metric names in lots of cases, and these don't appear to be centrally
>> documented, but maybe i haven't found the magic page.
>>
>> There are sooooo many good stats. We might also implement some facility
>> to dynamically turn on the delivery of detailed metrics on the nodes.
>>
>> On Tue, Oct 2, 2018 at 5:21 AM Alain RODRIGUEZ <arodr...@gmail.com>
>> wrote:
>>
>>> Hello Carl,
>>>
>>> I guess we can use bean_regex to do specific targetted metrics for the
>>>> important tables anyway.
>>>>
>>>
>>> Yes, this would work, but 350 is very limited for Cassandra dashboards.
>>> We have a LOT of metrics available.
>>>
>>> Datadog 350 metric limit is a PITA for tables once you get over 10 tables
>>>>
>>>
>>> I noticed this while I was working on providing default dashboards for
>>> Cassandra-Datadog integration. I was told by Datadog team it would not be
>>> an issue for users, that I should not care about it. As you pointed out,
>>> per table metrics quickly increase the total number of metrics we need to
>>> collect.
>>>
>>> I believe you can set the following option: *"max_returned_metrics:
>>> 1000"* - it can be used if metrics are missing to increase the limit of
>>> the number of collected metrics. Be aware of CPU utilization that this
>>> might imply (greatly improved in dd-agent version 6+ I believe -thanks
>>> Datadog teams for that- making this fully usable for Cassandra). This
>>> option should go in the *cassandra.yaml* file for Cassandra
>>> integrations, off the top of my head.
>>>
>>> Also, do not hesitate to reach to Datadog directly for this kind of
>>> questions, I have always been very happy with their support so far, I am
>>> sure they would guide you through this as well, probably better than we can
>>> do :). It also provides them with feedback on what people are struggling
>>> with I imagine.
>>>
>>> I am interested to know if you still have issues getting more metrics
>>> (option above not working / CPU under too much load) as this would make the
>>> dashboards we built mostly unusable for clusters with more tables. We might
>>> then need to review the design.
>>>
>>> As a side note, I believe metrics are handled the same way cross
>>> version, they got the same name/label for C*2.1, 2.2 and 3+ on Datadog.
>>> There is an abstraction layer that removes this complexity (if I remember
>>> well, we built those dashboards a while ago).
>>>
>>> C*heers
>>> -----------------------
>>> Alain Rodriguez - @arodream - al...@thelastpickle.com
>>> France / Spain
>>>
>>> The Last Pickle - Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>> Le lun. 1 oct. 2018 à 19:38, Carl Mueller
>>> <carl.muel...@smartthings.com.invalid> a écrit :
>>>
>>>> That's great too, thank you.
>>>>
>>>> Datadog 350 metric limit is a PITA for tables once you get over 10
>>>> tables, but I guess we can use bean_regex to do specific targetted metrics
>>>> for the important tables anyway.
>>>>
>>>> On Mon, Oct 1, 2018 at 4:21 AM Alain RODRIGUEZ <arodr...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello Carl,
>>>>>
>>>>> Here is a message I sent to my team a few months ago. I hope this will
>>>>> be helpful to you and more people around :). It might not be exhaustive 
>>>>> and
>>>>> we were moving from C*2.1 to C*3+ in this case, thus skipping C*2.2, but
>>>>> C*2.2 is similar to C*3.0 if I remember correctly in terms of metrics. 
>>>>> Here
>>>>> it is for what it's worth:
>>>>>
>>>>> Quite a few things changed between metric reporter in C* 2.1 and C*3.0.
>>>>> - ColumnFamily --> Table
>>>>> - XXpercentile --> pXX
>>>>> - 1MinuteRate -->  m1_rate
>>>>> - metric name before KS and Table names and some other changes of this
>>>>> kind.
>>>>> - ^ aggregations / aliases indexes changed because of this (using
>>>>> graphite for example) ^
>>>>> - ‘.value’ is not appended in the metric name anymore for gauges,
>>>>> nothing instead.
>>>>>
>>>>> For example (graphite):
>>>>>
>>>>> From
>>>>> aliasByNode(averageSeriesWithWildcards(cassandra.$env.$dc.$host.org.apache.cassandra.metrics.ColumnFamily.$ks.$table.ReadLatency.95percentile,
>>>>> 2, 3), 1, 7, 8, 9)
>>>>>
>>>>> to
>>>>> aliasByNode(averageSeriesWithWildcards(cassandra.$env.$dc.$host.org.apache.cassandra.metrics.Table.ReadLatency.$ks.$table.p95,
>>>>> 2, 3), 1, 8, 9, 10)
>>>>>
>>>>> C*heers,
>>>>> -----------------------
>>>>> Alain Rodriguez - @arodream - al...@thelastpickle.com
>>>>> France / Spain
>>>>>
>>>>> The Last Pickle - Apache Cassandra Consulting
>>>>> http://www.thelastpickle.com
>>>>>
>>>>> Le ven. 28 sept. 2018 à 20:38, Carl Mueller
>>>>> <carl.muel...@smartthings.com.invalid> a écrit :
>>>>>
>>>>>> VERY NICE! Thank you very much
>>>>>>
>>>>>> On Fri, Sep 28, 2018 at 1:32 PM Lyuben Todorov <
>>>>>> lyuben.todo...@instaclustr.com> wrote:
>>>>>>
>>>>>>> Nothing as fancy as a matrix but a list of what JMX term can see.
>>>>>>> Link to the online diff here: https://www.diffchecker.com/G9FE9swS
>>>>>>>
>>>>>>> /lyubent
>>>>>>>
>>>>>>> On Fri, 28 Sep 2018 at 19:04, Carl Mueller
>>>>>>> <carl.muel...@smartthings.com.invalid> wrote:
>>>>>>>
>>>>>>>> It's my understanding that metrics got heavily re-namespaced in JMX
>>>>>>>> for 2.2 from 2.1
>>>>>>>>
>>>>>>>> Did anyone ever make a migration matrix/guide for conversion of old
>>>>>>>> metrics to new metrics?
>>>>>>>>
>>>>>>>>
>>>>>>>>

Re: Metrics matrix: migrate 2.1.x metrics to 2.2.x+

Reply via email to