Hi Matthew, I had some extra time to think about this reply as Im reading the list in digest -- I will answer this mail with two parts mails, one about the concrete issue at hand and a second one generically about errors.ubuntu.com and what I think is wrong with it on a more fundamental level for me as a package maintainer (and thus user of it).
First on the concrete issue: On Tue, May 13, 2014 at 10:24:20AM +0100, Matthew Paul Thomas wrote: > You can. Set "Most common of these errors from" to "the date range", > then enter the dates, for example 2013-10-16 to 2013-10-20. The result > is not as you remember: over that period, bug 1219245 was not in the > top 50 at all, whereas it was #42 for the equivalent period around the > 14.04 release. I tried that, just for fun with the range 2014-04-13 2014-04-19 and got: libreoffice-core all 25 libreoffice-core 14.04 120 and went back from 14.04 to all and got a "the list of most common errors could not be loaded" -- from that it is a/ obvious, that the numbers are not absolute, but normalized in a way[1] that is making them uncomparable b/ these errors use to happen always to me after two or three changes to the selection in general, TBH thus made me mistrust this data for all but the most basic searches as I am always unsure if I see real data or old data with some stalled JSON request. > > While there is no good reproduction scenario in the bug reports, > > there is one report claiming it crahed "while installing a font" > > and another it crashed "during an upgrade". This leaves me with the > > suspicion, that the crash is actually people leaving LibreOffice > > running during a release upgrade (which is brave and a nice vote of > > confidence, but not really a supported scenario). > > "Supported" is a weasel word. I've never understood why Ubuntu lets > people have apps running during an upgrade, because that has many > weird effects. But Ubuntu *does* let people do that. And as long as it > does, Ubuntu developers are responsible for the resulting errors. This is hardly a LibreOffice issue -- any interactive application will have such issues, esp. if it has any kind of state. Thus the solution would be for the updater to search and warn about closing such applications. > > While those upgrade issues should be a concern too, as-is it seems > > to me they are overblown in their importance and we dont have a > > good way to look if they happen in regular production use after > > upgrade. > > With respect, I don't see that you have any justification in deciding > that this particular issue is "overblown". A crash in LibreOffice is > just as bad whether it happens during an upgrade, during a full moon, > or during the Olympic Games. All bugs are created equal? Not quite, as on errors.ubuntu.com we are ranking them esp. based on "frequency" -- with the implicit assumption that a high frequency bug will keep its high frequency throughout the lifecycle of the release. 'Upgrade-only bugs' break this assumption. > If you think it's unfair somehow that > apps are expected to keep running during upgrades, then fix the > upgrade process so that apps can't run during the upgrade. Don't just > filter out those crashes as if they aren't happening. In a perfect world, I would cover all LibreOffice crashers with the same vigor. I a perfect world, I also would have at least a 10 head team to do this (and all the other things needed for LibreOffice). Unfortunately we are not living in that world, but in one were I have to multiply frequency with severity and affected users and take care of those scoring highest. The broken assuption above is making errors.ubuntu.com much less useful for this. > > ... e.g. trivially: mark crashers 48hours after the upgrade > > as 'potentially an upgrade sideeffect' or somesuch? > Probably not retroactively. But I imagine it would be fairly easy to > add info to future error reports asking if do-release-upgrade (or > whatever) was running at the time. That would be very useful indeed. I assume the version data sent to errors.ubuntu.com in these cases to be wrong and poisoning the well anyway: - the client will send an error report with the application version the package manager reports to be installed - the application crashed will actually be a older, different one. It would be ideal to fingerprint the crashed binary and compare it to the version installed on disc, and skip reporting if those differ. But as a fallback, the 'mark first 48 hours' thing would be a pragmatic solution to prevent such wrong data to poison the stats. Now, apart from the concrete issue at hand for the general things about errors.ubuntu.com, and how it would become more useful for me. > Unfortunately, this calculation goes to hell on release day. All of a > sudden there are a gazillion new machines with the new version of > Ubuntu on them. And of those, some fraction will report their first > error. But that fraction are the only ones we know exist at all. So the > denominator is much too low -- making the calculated error rate much > too high. Thus the nice charts we are plotting on the page are mostly useless and neither help me find the most common issues of my package or the distro in toto[2]. > This is why the calculated error rate for every new release spikes on > release day, and corrects itself over the next 90 days. It's also why > the calculated error rate for 13.10 plummeted at the 14.04 release: > lots of 13.10 machines were upgraded to 14.04, and so from the error > tracker's point of view they're still 13.10 machines that suddenly > became error-free. So what are the charts actually telling us? To me they show more artifacts of their normalization than useful information about the stability of a release: - for the first 90 days, there is no good normalization -- thats already 25% of a release cycle - for the last month, people are already starting to migrate to the next release, so the normalization goes off again (another 16% of the release cycle) IMHO, _if_ errors.ubuntu.com plots anything, it should plot the months 4 and 5 of the life of each release cycle over each other. Likely that chart would be much more boring (and unfortunately rather too late for us to take action upon it), but it is the only sensible chart to create from the data. > If anyone would like to fix this, it's just a simple matter of > programming. ;-) <http://launchpad.net/bugs/1069827> Im not exactly sure how normalizing this in a different way would help me identify high frequency bugs faster, so fixing the charts not too high on my priority list. Things that would be much more interesting to me would be stuff like: - get the absolute counts for a LibreOffice version and the distro release for a stacktrace and the estimated size of deployment - find correlations between the counts of multiple stacktraces: - hinting at two bugs caused by the same root cause - if one trace has a good reproduction scenario and the other does not, this would be very helpful etc. - much more stuff like that. Critical for that would be to be able to download the data and see what works and what does not for identifying issues by fiddling around in some python script or ad-hoc data mangling in a spreadsheet. I certainly wont program a solution "into the blind" if I havent found it helpful in a few cases ad-hoc at least. Once I proved myself that a specific tactic/calculation provides me with helpful information for cornering bugs, I might consider implementing a generic solution in Errors directly -- but before that, I wont hassle with huge discussion, documentation and presentation tail that ensues. So, I would be interested in making Errors more useful -- but a prerequisite for that would be the ability to get a simple CSV file for subsets in the form: package, package version, distro release, stack trace id, day, crash count, est. deployment size easily from the page so I can play with that and find out what works and what does not in identifying bugs without hassleing with a flaky JavaScript monster that I cannot easily get data out in a processable form. Once I have that and I have the 10 head team to take care of the rest of the issues, I might come back and look at making the plots look nicer on the webpage[3]. Best, Bjoern [1] In the meantime I searched some more and found that the axis _should_ be labeled "Errors per machine per 24 hours", but isnt. https://wiki.ubuntu.com/ErrorTracker#errors.ubuntu.com [2] Roberts post at http://bobthegnome.blogspot.de/2014/05/errorsubuntucom.html confirms this, it finds: - people use their machines more on weekdays - people dont run ubuntu+1 on Christmas - people started to use the beta in March - people migrated from 13.10 to 14.04 quickly - people migrated from 12.04 to 14.04 slowly - people dont migrate from 12.10 much All of which are observation on deployment size and migration, none of it is a measure of stability/crash frequency or helpful in identifying the most painful bugs -- even relative. [3] Well, actually having that data, I might come up with an better normalization and contribute to bug 1069827 too. ;) -- ubuntu-devel mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
