Hi, sorry for my late answer, I read this list only rarely.
* angel [2010-04-21 10:58 +0200]: > In case Debian is interested in having someone work on a statistical > project of any nature (the project would be supervised by professors > of the Statistics and Operations Research department of the University > of Granada and and Debian would not be charged with any costs > whatsoever), please feel free to contact me. Since your request has been misunderstood partly, I subsume it at first: You want to do a small to medium-sized statistical project and you don't want to program in a non-statistical language during this project. Debian needs to provide you access to the data required to complete your project and most importantly needs to tell you what information it is interested in. Access to the data is easy: Stefano already mentioned Ultimate Debian Database (UDD) as possible data source. It should contain everything you would need in such a project. If you don't want to create a local copy of the database using the provided dump you almost certainly should get a guest-account on alioth.debian.org to access the database without any problems. As you might know, Debian divides bugs by severities, the most important ones are "critical", "serious" and "grave", these are release critical. Additionally there are "important", "normal", "minor" and "wishlist". Besides severities there are things like "fixed" and "found" and so on, all this is documented on http://bugs.debian.org/ below the heading "Bug tracking system documentation". We want to release the next Debian release Squeeze soon and to do this we need to get the number of release critical bugs in testing, currently named Squeeze, to 0 (except some ignored bugs). To be able to improve this process and thus improve Debian in general it would be nice if we would have some statistical data about how bugs have been fixed in the past. Understanding where we still have problems is the first step to resolve them. :) Some questions to be answered could be: * How long does it take on average until a bug with a particular severity gets fixed and how is the variance? Does this change if Debian is frozen (frozen describes the time before the actual release with a more restricted package migration to testing)? * Is there a correlation between the number of installations according to popcon and the time until a bug gets fixed? * Is there a correlation between the size of a package or the packages priority [1] and the average bug fixing time? * How does the number of people maintaining a package relate to bug fixing time and how does it relate to the number of unfixed bugs per package? Rationale for this is that we encourage single maintainers of important packages (with priority important and priority required) to switch to maintaining these packages in a team but we lack data to support our guess that team maintenance is really more efficient. * How does the number of bugs reported per time unit change after a new upstream release is packaged in comparison to releasing a new Debian revision? Does which part of the upstream version number has been incremented have any influence on it? I expect for example a 2.0 release to contain more bugs on average than a 2.2 release. * We currently migrate packages that don't introduce new release critical bugs to testing after being in unstable for 10 days (unless the maintainer or the release team overwrite this or other packages prevent it). Is this a good choice according to the time after an upload when most release critical bugs are reported? * Is there a relation between bug fixing and the time the last maintainer upload happend? Similar question is if there is a relation between bug fixing and the number of non maintainer uploads (NMUs) in the last n years? * How does the probability of a release critical bug being fixed in a NMU instead of a regular maintainer upload raise over time? * Is there a significant difference between bug fixing in officially maintained packages and in packages maintained by the QA team respectively orphaned packages? * Are certain programming languages or packages in certain sections more prone to FTBFS (fails to build from source) bugs, security related bugs or bugs in general? * A combination of probability of an release critical bugs being filed over time in conjunction with the two former questions and the number of installations according to popcon could possibly be used to help people to decide if a package should be removed from Debian or orphaned if the former maintainer lost interest. Currently I neither have an idea how this could be done in a sane way nor if it can be done in a sane way at all. I think you got the idea. If you deal with data and these question some time you should get a feeling which one might be relevant and interesting and which one can be ignored. If you find something one would not expect this should be the point where you start to dig in that direction. In general Debian wants "some useful and/or interesting data about how bugs are handled", everything I wrote above should be considered as a suggestion that can be used as guideline until you get the intuition to decide what looks promising. In no case I would expect every mentioned point to be addressed by such a project, just choose what you like and what seems to be valuable. I also would consider "selected questioned answered, everything is as one would expect, here's the data, nothing to see here, move along" as a very helpful result. Due the distributed nature of Debian it's difficult to provide a Debian Developer as exclusive contact person. If your university required this, especially if she or he would need to partly evaluate your work, this could be a problem, unless of course someone steps in and agrees to be your mentor for this project. If you are still interested in doing this project the obvious steps seem to be: 1. Install a local copy of UDD or get access to it using an account on alioth.debian.org. Drop me a mail if you have problems with registering on [2] or if I should ask an alioth admin for approval. 2. Make yourself familiar with the database schema [3] to be able to extract the required information. If necessary you would need to gain basic SQL knowledge to get the data, but I guess a statistician knows some SQL. 3. Talk to your professor. 4. Do the project. If reasonable you could provide intermediate results to this list, but this is not necessary. 5. Send the results of your work to this list and to pr...@debian.org. The press team will ensure that it will be mentioned in our regular Debian newsletter and possibly other places. It would be good, though not required, if the thesis could be placed somewhere on our website. Debian tries to ensure that the information on its website is free, the people maintaining it know exactly if a free license is required or just recommended for this. A minimal way to put it under a free license is writing the following text in the mail when you send your results (though this would not prevent others from using parts of your paper without attribution, but choosing another license with an attribution clause is easy): | License and Copyright of the attached document: | | Copyright (c) 2010 Name <E-mail address> | Permission to use, copy, modify, and/or distribute this software | for any purpose with or without fee is hereby granted. Regards Carsten P.S.: Please CC me on answers. [1] http://www.debian.org/doc/debian-policy/ch-archive.html#s-priorities [2] https://alioth.debian.org/account/register.php [3] http://udd.debian.org/schema/ -- To UNSUBSCRIBE, email to debian-project-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20100512235930.ga22...@foghorn.stateful.de