On Dec 10, 2015, at 12:31 AM, Igor Wiese <igor.wi...@gmail.com> wrote:
> Hi, Cloudstack Community. > > My name is Igor Wiese, phd Student from Brazil. In my research, I am > investigating two important questions: What makes two files change > together? Can we predict when they are going to co-change again? > > I've tried to investigate this question on the Cloudstack project. I've > collected data from issue reports, discussions and commits and using some > machine learning techniques to build a prediction model. > > I collected a total of 141 commits in which a pair of files changed > together and could correctly predict 60% commits. Hi Igor, why 141 commits ? Is that the only commits you found with only a pair for changes ? My gut feeling is that you could check the entire history of the CloudStack repo (~5 years worth of data) and work on different type of tuples. 141 commits seems like a really small dataset. -Sebastien > These were the most > useful information for predicting co-changes of files: > > - sum of number of lines of code added, modified and removed, > > - number of words used to describe and discuss the issues, > > - number of comments in each issue, > > - median value of closeness, a social network measure obtained from issue > comments, and > > - median value of constraint, a social network measure obtained from issue > comments. > > To illustrate, consider the following example from our analysis. For > release 4.4, the files "cloud/hypervisor/XenServerGuru.java" and > "cloud/hypervisor/guru/VMwareGuru.java " changed together in 3 commits. In > another 2 commits, only the first file changed, but not the second. > Collecting contextual information for each commit made to first file in the > previous release (4.3), we were able to predict all 3 commits in which both > files changed together in release 4.4, and we only issued 0 false > positives. For this pair of files, the most important contextual > information was the number of lines of code added, removed and modified in > each commit,the number of comments in each issue, and social network > measures (closeness, density, constraint, hierarchy) obtained from issue > comments. > > - Do these results surprise you? Can you think in any explanation for the > results? > > - Do you think that our rate of prediction is good enough to be used for > building tool support for the software community? > > - Do you have any suggestion on what can be done to improve the change > recommendation? > > You can visit our webpage to inspect the results in details: > http://flosscoach.com/index.php/17-cochanges/67-cloudstack > > All the best, > Igor Wiese > Phd Candidate