Dear all,
here is a small student project idea: In previous work on MAINTAINERS and process conformance, Pia Eichinger [1] has investigated: are patches integrated by the maintainers defined by the responsibilities in MAINTAINERS? In this project, we are interested in a related (possibly simpler) question: Are the commits integrated into the appropriate integration trees referenced in MAINTAINERS? As I believe, a main difference between considering maintainers and integration trees is that the information in MAINTAINERS about integration trees is more erroneous, as it is not used as prominently as the personal maintainer information, name and email, with the wide-spread use of ./scripts/get_maintainer.pl. So, correcting those errors on integration trees in MAINTAINERS is more dominant (but also simpler) compared to correcting errors on personal maintainer information in MAINTAINERS. The answer on the question above can then ultimately be used to identify which integration tree entries should be added to specific sections in MAINTAINERS to match best against the actual integration observed in git. The factors and metric to determine what is best is of course the challenging task of identifying a suitable heuristics that is: 1. good enough to be used to create a change to MAINTAINERS that is accepted by the community, and 2. simple enough to be implemented with reasonable effort. Background: The MAINTAINERS section includes references, through the T: entries, to the location of a source configuration management (SCM) tree with its type, e.g., git, quilt, hg, For each commit, the kernel git history carries the commit's integration tree path, i.e., the information through with source configuration management (SCM) trees a commit was integrated until it was finally integrated into Linus Torvalds' tree. Ideally the references in the MAINTAINERS sections are: - complete, i.e, all integration trees used for recent kernel releases are mentioned in MAINTAINERS. - sound, i.e., the majority of the commits are integrated through the trees referenced in the MAINTAINERS sections a patch belongs to. - precise, i.e., for each MAINTAINERS section, the majority of the commits that belong to a section are integrated through the tree referenced in that section. Goal: We identify and measure to these properties above, completeness, soundness and precision. Then, we use that information to determine which integration tree entries should be added to which specific sections to maximally increase the three properties. To evaluate the adequacy of this method, we can obtain feedback from the responsible kernel maintainers through proposing patches modifying the MAINTAINERS file, for the additions that we identified as most relevant (maximally increasing the properties, to a reasonable threshold of number of patch proposals [to not swamp maintainers initially] and a threshold on relevance [to not send out minor changes that are largely irrelevant to the community]). In this project, we can make use of: - gitdm [git://git.lwn.net/gitdm.git]: gitdm includes some scripts to parse MAINTAINERS and obtain the integration tree patch of a commit. and/or - pasta [https://github.com/lfd/PaStA]: Similarly to gitdm, pasta provides functionality to parse MAINTAINERS and some functionalities on extracting information on commits. Potential project phases: - In the first phase (PoC phase), we could probably just create a setup that combines or extends the functionality in gitdm and/or in pasta. - In the second phase (MAINTAINERS patch creation phase), we send out some patches and collect feedback from maintainers. - In a third phase, with a better understanding of the individual pieces in gitdm and/or in pasta, we could then create a cleaner design that also refactors gitdm and pasta to share the same implementation when essentially the same basic functionality is used within the various analyses. References: [1] https://lists.elisa.tech/g/devel/message/1269 --- Any thoughts on this small student project? If it is not too crazy, I will mentor a student on this project through one of the next mentoring programs (Google Summer of Code, LF mentorship, etc.). Lukas