Hi all, this is a proposal to get as much focused tinderboxing as possible with limited resources and unreliable build clients. This also assumes patches under consideration to be independant (that is: without conflicts) and not dependant on a specific order. gerrit allow us to have patches independant of order as they are not yet merged/picked to the branch. We can also easily exclude any patch on gerrit that conflicts against the existing patches under consideration: To do so we cherry-pick the new patch on the tip of the branch and then cherry-pick all patches under consideration on top of that. If it conflicts, we can report that back.(*)
== Testcase == Tinderboxes will be grouped in "testcases": All tinderboxes in belonging to one "testcase" are assumed to provide the same results for the same input. The simplest testcase is: "completing a build on platform foo" while more complex testcases include "completed a build on platform foo and run checks". A testcase is assumed to have a rather stable long term empiric value that one patch does not breaks it: p-innocent(testcase). Whenever a new patch is commited for consideration we assume its inital "innocence" to be average: p-innocent(patch) = p-innocent(testcase) == Giving a tinderbox a subset to test == There are two possibly useful results out of a tinderbox run: - If the tinderbox run is successful, we can mark all patches as good and remove them from the patches under consideration - If the tinderbox run is a failure and only contains one patch, we can mark that patch as bad and remove it from the patches under consideration - All other cases have no directly deductable action === optimistic tinderbox run === A completed tinderbox run provides us with the most information, if we assume it to fail half of the time (biggest insecurity). So, this is the first thing to give a tinderbox to test: - Pick the most innocent looking patches (highest p-innocent(patch)) - for patches with the same innocence prefer the oldest ones first - Add patches as long as the innocence of the patchset: p-innocent(patchset) = product(p-innocent(patchn) ...) is above 0.5 After sending off the tinderbox with this patchset, do: p-innocent(patch) *= p-innocent(testcase) reducing the innocence of the patches without waiting for results. So we assume the patches of the set to be partially guilty until proven innocent. If the tinderbox succeeds, mark all patches as good and remove them from the patches under consideration. === pessimistic tinderbox run === A failed tinderbox run provides us with most information, if it contains only one patch. So if the innocence of one or more patches drops below 0.5: p-innocent(dirtypatch)<= 0.5 or if there is currently only one patch under consideration instead of doing the optimistic approach above, do a pessimistic tinderbox run with just one patch. Select the patch with the lowest p-innocent(patch) and, if there are multiple patches with equal p-innocent(patch), select the oldest candidate. After sending off the tinderbox with just this one patch, do: p-innocent(patch)/=p-innocent(testcase) for all patches in each tinderbox run with this patch (once for each run) without waiting for the tinderbox to return with a result. So we assume most of those patches to be more innocent now (hoping the blame is on the dirtypatch). The suspicous patch should now be back to the initial value: p-innocent(dirtypatch)=p-innocent(testcase) The tinderbox will hopefully mark that patch as good or bad soon anyway and remove it from the patches under consideration. == advantages == This approach: - Does not rely on the tinderboxes to ever come back with a result -- it should be robust against lost results. - does not need to test each patch individually, if the average patch quality is not too bad - requires only little bookkeeping on the side of the dispatcher - should allow us to confidently mark all patches as 'doesnt break master for this testcase' Opinions/Additions/Corrections? Objections? If not, we only need somebody to script this ;) Best, Bjoern P.S.: Dont show my careless use of 'propabilities' and such to the physics department at the university where I got my degree. It will just result in snorty laughs and cruel teasing. (*) Conflicts might be rare though, so we might even get away with the optimistic approach: just hoping that considered patches never conflict.Hi all, this is a proposal to get as much focused tinderboxing as possible with limited resources and unreliable build clients. This also assumes patches under consideration to be independant (that is: without conflicts) and not dependant on a specific order. gerrit allow us to have patches independant of order as they are not yet merged/picked to the branch. We can also easily exclude any patch on gerrit that conflicts against the existing patches under consideration: To do so we cherry-pick the new patch on the tip of the branch and then cherry-pick all patches under consideration on top of that. If it conflicts, we can report that back.(*) == Testcase == Tinderboxes will be grouped in "testcases": All tinderboxes in belonging to one "testcase" are assumed to provide the same results for the same input. The simplest testcase is: "completing a build on platform foo" while more complex testcases include "completed a build on platform foo and run checks". A testcase is assumed to have a rather stable long term empiric value that one patch does not breaks it: p-innocent(testcase). Whenever a new patch is commited for consideration we assume its inital "innocence" to be average: p-innocent(patch) = p-innocent(testcase) == Giving a tinderbox a subset to test == There are two possibly useful results out of a tinderbox run: - If the tinderbox run is successful, we can mark all patches as good and remove them from the patches under consideration - If the tinderbox run is a failure and only contains one patch, we can mark that patch as bad and remove it from the patches under consideration - All other cases have no directly deductable action === optimistic tinderbox run === A completed tinderbox run provides us with the most information, if we assume it to fail half of the time (biggest insecurity). So, this is the first thing to give a tinderbox to test: - Pick the most innocent looking patches (highest p-innocent(patch)) - for patches with the same innocence prefer the oldest ones first - Add patches as long as the innocence of the patchset: p-innocent(patchset) = product(p-innocent(patchn) ...) is above 0.5 After sending of the tinderbox with this patchset, do: p-innocent(patch) *= p-innocent(testcase) reducing the innocence of the patches without waiting for results. So we assume the patches of the set to be partially guilty until proven innocent. If the tinderbox succeeds, mark all patches as good and remove them from the patches under consideration. === pessimistic tinderbox run === A failed tinderbox run provides us with most information, if it contains only one patch. So if the innocence of one or more patches drops below 0.5: p-innocent(dirtypatch)<= 0.5 instead of doing the optimistic approach above, do a pessimistic tinderbox run with just one patch. Select the patch with the lowest p-innocent(patch) and, if there are multiple patches with equal p-innocent(patch), select the oldest candidate. After sending of the tinderbox with just this one patch, do: p-innocent(patch)/=p-innocent(testcase) for all patches in each tinderbox run with this patch (once for each run) without waiting for the tinderbox to return with a result. So we assume most of those patches to be more innocent now (hoping the blame is on the dirtypatch). The suspicous patch should now be back to the initial value: p-innocent(dirtypatch)=p-innocent(testcase) The tinderbox will hopefully mark that patch as good or bad soon anyway and remove it from the patches under consideration. == advantages == This approach: - Does not rely on the tinderboxes to ever come back with a result -- it should be robust against lost results. - does not need to test each patch individually, if the average patch quality is not too bad - requires only little bookkeeping on the side of the dispatcher - should allow us to confidently mark all patches as 'doesnt break master for this testcase' Opinions/Additions/Corrections? Objections? If not, we only need somebody to script this ;) Best, Bjoern P.S.: Dont show my careless use of 'propabilities' and such to the physics department at the university where I got my degree. It will just result in snorty laughs and cruel teasing. (*) Conflicts might be rare though, so we might even get away with the optimistic approach: just hoping that considered patches never conflict. _______________________________________________ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice