Hello Elena and all, I have pushed the fixed code. There are a lot of changes in it because I went through all the code making sure that it made sense. The commit is here <https://github.com/pabloem/Kokiri/commit/7c47afc45a7b1f390e8737df58205fa53334ba09>, and although there are a lot of changes, the main line where failures are caught or missed is this <https://github.com/pabloem/Kokiri/blob/7c47afc45a7b1f390e8737df58205fa53334ba09/simulator.py#L496> .
1. The test result file edition information helps improve recall - if marginally 2. The time since last run information does not improve recall much at all - See [Weaknesses - 2] A couple of concepts that I want to define before going on: - *First failures*. These are failures that happen because of new bugs. They don't occur close in time as part of a chain of failures. The occur as a consequence of a transaction that introduces a bug, but they might occur soon or long after this transaction (usually soon, rather than long). They might be correlated with the frequency of failure of a test (core or basic tests that fail often might be specially good at exposing bugs); but many of them are not (tests of a feature, that don't fail often, but rather, when that feature is modified). - *Strict simulation mode.* This is the mode where, if a test is not part of the running set, its failure is not considered. Weaknesses: - It's very difficult to predict 'first failures'. With the current strategy, if it's been long since a test failed (or if it has never failed before), the relevancy of the test just goes down, and it never runs. - Specially in database, and parallel software, there are bugs that hide in the code for a long time until one test discovers them. Unfortunately, the analysis that I'm doing requires that the test runs exactly when the data indicates it will fail. If a test that would fail doesn't run in test run Z, even though it might run in test run Z+1, the failure is just considered as missed, as if the bug was 'not encountered' ever. - This affects the *time since last run* factor. This factor helps encounter 'hidden' bugs that can be exposed by tests that have not run, but the data available makes it difficult - This would also affect the *correlation* factor. If test A and B fail together often, and on test_run Z both of them would fail, but only A runs, the heightened relevancy of B on the next test_run would not make it catch anything (again, this is a limitation of the data, not of reality) - Humans are probably a lot better at predicting first failures than the current strategy. Some ideas: - I need to be more strict with my testing, and reviewing my code : ) - I need to improve prediction of 'first failures'. What would be a good way to improve this? - Correlation between files changed - Tests failed? Apparently Sergei tried this, but the results were not too good - But this is before running in strict simulation mode. With strict simulation mode, anything that could help spot first failures could be considered. I am currently running tests to get the adjusted results. I will graph them, and send them out in a couple hours. Regards Pablo On Fri, Jun 13, 2014 at 12:40 AM, Elena Stepanova <ele...@montyprogram.com> wrote: > Hi Pablo, > > Thanks for the update. > > > On 12.06.2014 19:13, Pablo Estrada wrote: > >> Hello Sergei, Elena and all, >> Today while working on the script, I found and fixed an issue: >> >> There is some faulty code code in my script that is in charge of >> collecting >> the statistics about whether a test failure was caught or not (here >> <https://github.com/pabloem/Kokiri/blob/master/basic_simulator.py#L393>). >> I >> looked into fixing it, and then I could see another *problem*: The *recall >> numbers* that I had collected previously were *too high*. >> >> The actual recall numbers, once we consider the test failures that are >> *not >> caught*, are disappointingly lower. I won't show you results yet, since I >> >> want to make sure that the code has been fixed, and I have accurate tests >> first. >> >> This is all for now. The strategy that I was using is a lot less effective >> than it seemed initially. I will send out a more detailed report with >> results, my opinion on the weak points of the strategy, and ideas, >> including a roadmap to try to improve results. >> >> Regards. All feedback is welcome. >> > > Please push your fixed code that triggered the new results, even if you > are not ready to share the results themselves yet. It will be easier to > discuss then. > > Regards, > Elena > > > Pablo >> >> >> >> _______________________________________________ >> Mailing list: https://launchpad.net/~maria-developers >> Post to : maria-developers@lists.launchpad.net >> Unsubscribe : https://launchpad.net/~maria-developers >> More help : https://help.launchpad.net/ListHelp >> >>
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp