Yep agreed with that. Count me in. On Sun., 23 Sep. 2018, 00:33 Benedict Elliott Smith, <bened...@apache.org> wrote:
> Thanks Kurt. I think the goal would be to get JIRA into a state where it > can hold all the information we want, and for it to be easy to get all the > information correct when filing. > > My feeling is that it would be easiest to do this with a small group, so > we can make rapid progress on an initial proposal, then bring that to the > community for final tweaking / approval (or, perhaps, rejection - but I > hope it won’t be a contentious topic). I don’t think it should be a huge > job to come up with a proposal - though we might need to organise a > community effort to clean up the JIRA history! > > It would be great if we could get a few more volunteers from other > companies/backgrounds to participate. > > > > On 22 Sep 2018, at 11:54, kurt greaves <k...@instaclustr.com> wrote: > > > > I'm interested. Better defining the components and labels we use in our > > docs would be a good start and LHF. I'd prefer if we kept all the > > information within JIRA through the use of fields/labels though, and > > generated reports off those tags. Keeping all the information in one > place > > is much better in my experience. Not applicable for CI obviously, but > > ideally we can generate testing reports directly from the testing > systems. > > > > I don't see this as a huge amount of work so I think the overall risk is > > pretty small, especially considering it can easily be done in a way that > > doesn't affect anyone until we get consensus on methodology. > > > > > > > > On Sat, 22 Sep 2018 at 03:44, Scott Andreas <sc...@paradoxica.net> > wrote: > > > >> Josh, thanks for reading and sharing feedback. Agreed with starting > simple > >> and measuring inputs that are high-signal; that’s a good place to begin. > >> > >> To the challenge of building consensus, point taken + agreed. Perhaps > the > >> distinction is between producing something that’s “useful” vs. something > >> that’s “authoritative” for decisionmaking purposes. My motivation is to > >> work toward something “useful” (as measured by the value contributors > >> find). I’d be happy to start putting some of these together as part of > an > >> experiment – and agreed on evaluating “value relative to cost” after we > see > >> how things play out. > >> > >> To Benedict’s point on JIRA, agreed that plotting a value from messy > input > >> wouldn’t produce useful output. Some questions a small working group > might > >> take on toward better categorization might look like: > >> > >> ––– > >> – Revisiting the list of components: e.g., “Core” captures a lot right > now. > >> – Revisiting which fields should be required when filing a ticket – and > if > >> there are any that should be removed from the form. > >> – Reviewing active labels: understanding what people have been trying to > >> capture, and how they could be organized + documented better. > >> – Documenting “priority”: (e.g., a common standard we can point to, even > >> if we’re pretty good now). > >> – Considering adding a "severity” field to capture the distinction > between > >> priority and severity. > >> ––– > >> > >> If there’s appetite for spending a little time on this, I’d put effort > >> toward it if others are interested; is anyone? > >> > >> Otherwise, I’m equally fine with an experiment to measure basics via the > >> current structure as Josh mentioned, too. > >> > >> – Scott > >> > >> > >> On September 20, 2018 at 8:22:55 AM, Benedict Elliott Smith ( > >> bened...@apache.org<mailto:bened...@apache.org>) wrote: > >> > >> I think it would be great to start getting some high quality info out of > >> JIRA, but I think we need to clean up and standardise how we use it to > >> facilitate this. > >> > >> Take the Component field as an example. This is the current list of > >> options: > >> > >> 4.0 > >> Auth > >> Build > >> Compaction > >> Configuration > >> Core > >> CQL > >> Distributed Metadata > >> Documentation and Website > >> Hints > >> Libraries > >> Lifecycle > >> Local Write-Read Paths > >> Materialized Views > >> Metrics > >> Observability > >> Packaging > >> Repair > >> SASI > >> Secondary Indexes > >> Streaming and Messaging > >> Stress > >> Testing > >> Tools > >> > >> In some cases there's duplication (Metrics + Observability, Coordination > >> (=“Storage Proxy, Hints, Batchlog, Counters…") + Hints, Local Write-Read > >> Paths + Core) > >> In others, there’s a lack of granularity (Streaming + Messaging, Core, > >> Coordination, Distributed Metadata) > >> In others, there’s a lack of clarity (Core, Lifecycle, Coordination) > >> Others are probably missing entirely (Transient Replication, …?) > >> > >> Labels are also used fairly haphazardly, and there’s no clear definition > >> of “priority” > >> > >> Perhaps we should form a working group to suggest a methodology for > >> filling out JIRA, standardise the necessary components, labels etc, and > put > >> together a wiki page with step-by-step instructions on how to do it? > >> > >> > >>> On 20 Sep 2018, at 15:29, Joshua McKenzie <jmcken...@apache.org> > wrote: > >>> > >>> I've spent a good bit of time thinking about the above and bounced off > >> both > >>> different ways to measure quality and progress as well as trying to > >>> influence community behavior on this topic. My advice: start small and > >>> simple (KISS, YAGNI, all that). Get metrics for pass/fail on > >>> utest/dtest/flakiness over time, perhaps also aggregate bug count by > >>> component over time. After spending a predetermined time doing that (a > >>> couple months?) as an experiment, we retrospect as a project and see if > >>> these efforts are adding value commensurate with the time investment > >>> required to perform the measurement and analysis. > >>> > >>> There's a lot of really good ideas in that linked wiki article / this > >> email > >>> thread. The biggest challenge, and risk of failure, is in translating > >> good > >>> ideas into action and selling project participants on the value of > >> changing > >>> their behavior. The latter is where we've fallen short over the years; > >>> building consensus (especially regarding process /shudder) is Very > Hard. > >>> > >>> Also - thanks for spearheading this discussion Scott. It's one we come > >> back > >>> to with some regularity so there's real pain and opportunity here for > the > >>> project imo. > >>> > >>> On Wed, Sep 19, 2018 at 9:32 PM Scott Andreas <sc...@paradoxica.net> > >> wrote: > >>> > >>>> Hi everyone, > >>>> > >>>> Now that many teams have begun testing and validating Apache Cassandra > >>>> 4.0, it’s useful to think about what “progress” looks like. While > >> metrics > >>>> alone may not tell us what “done” means, they do help us answer the > >>>> question, “are we getting better or worse — and how quickly”? > >>>> > >>>> A friend described to me a few attributes of metrics he considered > >> useful, > >>>> suggesting that good metrics are actionable, visible, predictive, and > >>>> consequent: > >>>> > >>>> – Actionable: We know what to do based on them – where to invest, what > >> to > >>>> fix, what’s fine, etc. > >>>> – Visible: Everyone who has a stake in a metric has full visibility > into > >>>> it and participates in its definition. > >>>> – Predictive: Good metrics enable forecasting of outcomes – e.g., > >>>> “consistent performance test results against build abc predict an x% > >>>> reduction in 99%ile read latency for this workload in prod". > >>>> – Consequent: We take actions based on them (e.g., not shipping if > tests > >>>> are failing). > >>>> > >>>> Here are some notes in Confluence toward metrics that may be useful to > >>>> track beginning in this phase of the development + release cycle. I’m > >>>> interested in your thoughts on these. They’re also copied inline for > >> easier > >>>> reading in your mail client. > >>>> > >>>> Link: > >>>> > >> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=93324430 > >>>> > >>>> Cheers, > >>>> > >>>> – Scott > >>>> > >>>> –––––– > >>>> > >>>> Measuring Release Quality: > >>>> > >>>> [ This document is a draft + sketch of ideas. It is located in the > >>>> "discussion" section of this wiki to indicate that it is an active > >> draft – > >>>> not a document that has been voted on, achieved consensus, or in any > way > >>>> official. ] > >>>> > >>>> Introduction: > >>>> > >>>> This document outlines a series of metrics that may be useful toward > >>>> measuring release quality, and quantifying progress during the > testing / > >>>> validation phase of the Apache Cassandra 4.0 release cycle. > >>>> > >>>> The goal of this document is to think through what we should consider > >>>> measuring to quantify our progress testing and validating Apache > >> Cassandra > >>>> 4.0. This document explicitly does not discuss release criteria – > though > >>>> metrics may be a useful input to a discussion on that topic. > >>>> > >>>> > >>>> Metric: Build / Test Health (produced via CI, recorded in Confluence): > >>>> > >>>> Bread-and-butter metrics intended to capture baseline build health, > >>>> flakiness in the test suite, and presented as a time series to > >> understand > >>>> how they’ve changed from build to build and release to release: > >>>> > >>>> Metrics: > >>>> > >>>> – Pass / fail metrics for unit tests > >>>> – Pass / fail metrics for dtests > >>>> – Flakiness stats for unit and dtests > >>>> > >>>> > >>>> Metric: “Found Bug” Count by Methodology (sourced via JQL, reported in > >>>> Confluence): > >>>> > >>>> These are intended to help us understand the efficacy of each > >> methodology > >>>> being applied. We might consider annotating bugs found in JIRA with > the > >>>> methodology that produced them. This could be consumed as input in a > JQL > >>>> query and reported on the Confluence dev wiki. > >>>> > >>>> As we reach a pareto-optimal level of investment in a methodology, > we’d > >>>> expect to see its found-bug rate taper. As we achieve higher quality > >> across > >>>> the board, we’d expect to see a tapering in found-bug counts across > all > >>>> methodologies. In the event that one or two approaches is an outlier, > >> this > >>>> could indicate the utility of doubling down on a particular form of > >> testing. > >>>> > >>>> We might consider reporting “Found By” counts for methodologies such > as: > >>>> > >>>> – Property-based / fuzz testing > >>>> – Replay testing > >>>> – Upgrade / Diff testing > >>>> – Performance testing > >>>> – Shadow traffic > >>>> – Unit/dtest coverage of new areas > >>>> – Source audit > >>>> > >>>> > >>>> Metric: “Found Bug” Count by Subsystem/Component (sourced via JQL, > >>>> reported in Confluence): > >>>> > >>>> Similar to “found by,” but “found where.” These metrics help us > >> understand > >>>> which components or subsystems of the database we’re finding issues > in. > >> In > >>>> the event that a particular area stands out as “hot,” we’ll have the > >>>> quantitative feedback we need to support investment there. Tracking > >> these > >>>> counts over time – and their first derivative – the rate – also helps > us > >>>> make statements regarding progress in various subsystems. Though we > >> can’t > >>>> prove a negative (“no bugs have been found, therefore there are no > >> bugs”), > >>>> we gain confidence as their rate decreases normalized to the effort > >> we’re > >>>> putting in. > >>>> > >>>> We might consider reporting “Found In” counts for components as > >> enumerated > >>>> in JIRA, such as: > >>>> – Auth > >>>> – Build > >>>> – Compaction > >>>> – Compression > >>>> – Core > >>>> – CQL > >>>> – Distributed Metadata > >>>> – …and so on. > >>>> > >>>> > >>>> Metric: “Found Bug” Count by Severity (sourced via JQL, reported in > >>>> Confluence) > >>>> > >>>> Similar to “found by/where,” but “how bad”? These metrics help us > >>>> understand the severity of the issues we encounter. As build quality > >>>> improves, we would expect to see decreases in the severity of issues > >>>> identified. A high rate of critical issues identified late in the > >> release > >>>> cycle would be cause for concern, though it may be expected at an > >> earlier > >>>> time. > >>>> > >>>> These could roughly be sourced from the “Priority” field in JIRA: > >>>> – Trivial > >>>> – Minor > >>>> – Major > >>>> – Critical > >>>> – Blocker > >>>> > >>>> While “priority” doesn’t map directly to “severity,” it may be a > useful > >>>> proxy. Alternately, we could introduce a label intended to represent > >>>> severity if we’d like to make that clear. > >>>> > >>>> > >>>> Metric: Performance Tests > >>>> > >>>> Performance tests tell us “how fast” (and “how expensive”). There are > >> many > >>>> metrics we could capture here, and a variety of workloads they could > be > >>>> sourced from. > >>>> > >>>> I’ll refrain from proposing a particular methodology or reporting > >>>> structure since many have thought about this. From a reporting > >> perspective, > >>>> I’m inspired by Mozilla’s “arewefastyet.com<http://arewefastyet.com>” > >>>> used to report the performance of their Javascript engine relative to > >>>> Chrome’s: https://arewefastyet.com/win10/overview > >>>> > >>>> Having this sort of feedback on a build-by-build basis would help us > >> catch > >>>> regressions, quantify improvements, and provide a baseline against 3.0 > >> and > >>>> 3.x. > >>>> > >>>> > >>>> Metric: Code Coverage (/ other static analysis techniques) > >>>> > >>>> It may also be useful to publish metrics from CI on code coverage by > >>>> package/class/method/branch. These might not be useful metrics for > >>>> “quality” (the relationship between code coverage and quality is > >> tenuous). > >>>> > >>>> However, it would be useful to quantify the trend over time between > >>>> releases, and to source a “to-do” list for important but > poorly-covered > >>>> areas of the project. > >>>> > >>>> > >>>> Others: > >>>> > >>>> There are more things we could measure. We won’t want to drown > ourselves > >>>> in metrics (or the work required to gather them) –– but there are > likely > >>>> more not described here that could be useful to consider. > >>>> > >>>> > >>>> Convergence Across Metrics: > >>>> > >>>> The thesis of this document is that improvements in each of these > areas > >>>> are correlated with increases in quality. Improvements across all > areas > >> are > >>>> correlated with an increase in overall release quality. Tracking > metrics > >>>> like these provides the quantitative foundation for assessing > progress, > >>>> setting goals, and defining criteria. In that sense, they’re not an > end > >> – > >>>> but a beginning. > >>>> > >> > >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >