On Dec 10, 2015, at 12:31 AM, Igor Wiese <igor.wi...@gmail.com> wrote:

> Hi, Cloudstack Community.
> 
> My name is Igor Wiese, phd Student from Brazil. In my research, I am
> investigating two important questions: What makes two files change
> together? Can we predict when they are going to co-change again?
> 
> I've tried to investigate this question on the Cloudstack project. I've
> collected data from issue reports, discussions and commits and using some
> machine learning techniques to build a prediction model.
> 
> I collected a total of 141 commits in which a pair of files changed
> together and could correctly predict 60% commits.


Hi Igor, why 141 commits ? Is that the only commits you found with only a pair 
for changes ?

My gut feeling is that you could check the entire history of the CloudStack 
repo (~5 years worth of data) and work on different type of tuples.

141 commits seems like a really small dataset.

-Sebastien

> These were the most
> useful information for predicting co-changes of files:
> 
> - sum of number of lines of code added, modified and removed,
> 
> - number of words used to describe and discuss the issues,
> 
> - number of comments in each issue,
> 
> - median value of closeness, a social network measure obtained from issue
> comments, and
> 
> - median value of constraint, a social network measure obtained from issue
> comments.
> 
> To illustrate, consider the following example from our analysis. For
> release 4.4, the files "cloud/hypervisor/XenServerGuru.java" and
> "cloud/hypervisor/guru/VMwareGuru.java " changed together in 3 commits. In
> another 2 commits, only the first file changed, but not the second.
> Collecting contextual information for each commit made to first file in the
> previous release (4.3), we were able to predict all 3 commits in which both
> files changed together in release 4.4, and we only issued 0 false
> positives. For this pair of files, the most important contextual
> information was the number of lines of code added, removed and modified in
> each commit,the number of comments in each issue, and social network
> measures (closeness, density, constraint, hierarchy) obtained from issue
> comments.
> 
> - Do these results surprise you? Can you think in any explanation for the
> results?
> 
> - Do you think that our rate of prediction is good enough to be used for
> building tool support for the software community?
> 
> - Do you have any suggestion on what can be done to improve the change
> recommendation?
> 
> You can visit our webpage to inspect the results in details:
> http://flosscoach.com/index.php/17-cochanges/67-cloudstack
> 
> All the best,
> Igor Wiese
> Phd Candidate

Reply via email to