Hi Anshul. Thanks for your answer First of all, sorry about the webpage. I checked and now it is working http://flosscoach.com/index.php/17-cochanges/67-cloudstack. Let me know if you still having problem to access the webpage.
About your questions: 1) What do you mean by "correctly predict 60% commits”? - Let's suppose that you changed cloud/hypervisor/XenServerGuru.java in an issue 10000. After commit this file, which other files you could change to complete the changes? Then, we can collect data from previous issues/commits when XenServerGuru.java changed in the previous release and recomend to you which other files are more prone to change together in this new issue that you are working. In 60% of the commits when we applied our approach, we could correctly predict (recommend) files to change together with cloud/hypervisor/XenServerGuru.java. In the webpage you can check all "combinations" (pairs of files) that we tested to cloudstack project based on releases 4.1, 4.2, 4.3 and 4.4 2) What are the feature measures you are giving as input here to system (prediction model)? In total we used 21 metrics: from Issues, communication/experience and commit. Each pair of file that we tested used different combinations of measures. From issues for example we used the name of assignee, reporter, size of description+discussion. From the communication we got the number of comments, if older commiters from the same size also made comments, social network from issues/Pull Requests, etc. From commit the number of lines added, modified, removed. 3) What kind of output you are expecting? Let's suppose two real scenarios. You are a newcomer, you have difficult to complete your issues because you don't read much code or don't know much about the architecture. In such cases newcomers could use our approach (we are building a tool) to receive recommendations while performing the task. In the other hand, let's suppose that you are a core member and you are reviewing the Pull Request, we could give you a list of files to check, if all of them are in the commit. All the best, Igor Wiese 2015-12-10 7:11 GMT-02:00 Anshul Gangwar <anshul.gang...@citrix.com>: > Before giving feedback I have some questions > > 1) What do you mean by "correctly predict 60% commits”? > 2) What are the feature measures you are giving as input here to system > (prediction model)? > 3) What kind of output you are expecting? > > Web page link you have provided is not working. > > > On 10-Dec-2015, at 5:01 AM, Igor Wiese <igor.wi...@gmail.com> wrote: > > > > Hi, Cloudstack Community. > > > > My name is Igor Wiese, phd Student from Brazil. In my research, I am > > investigating two important questions: What makes two files change > > together? Can we predict when they are going to co-change again? > > > > I've tried to investigate this question on the Cloudstack project. I've > > collected data from issue reports, discussions and commits and using some > > machine learning techniques to build a prediction model. > > > > I collected a total of 141 commits in which a pair of files changed > > together and could correctly predict 60% commits. These were the most > > useful information for predicting co-changes of files: > > > > - sum of number of lines of code added, modified and removed, > > > > - number of words used to describe and discuss the issues, > > > > - number of comments in each issue, > > > > - median value of closeness, a social network measure obtained from issue > > comments, and > > > > - median value of constraint, a social network measure obtained from > issue > > comments. > > > > To illustrate, consider the following example from our analysis. For > > release 4.4, the files "cloud/hypervisor/XenServerGuru.java" and > > "cloud/hypervisor/guru/VMwareGuru.java " changed together in 3 commits. > In > > another 2 commits, only the first file changed, but not the second. > > Collecting contextual information for each commit made to first file in > the > > previous release (4.3), we were able to predict all 3 commits in which > both > > files changed together in release 4.4, and we only issued 0 false > > positives. For this pair of files, the most important contextual > > information was the number of lines of code added, removed and modified > in > > each commit,the number of comments in each issue, and social network > > measures (closeness, density, constraint, hierarchy) obtained from issue > > comments. > > > > - Do these results surprise you? Can you think in any explanation for the > > results? > > > > - Do you think that our rate of prediction is good enough to be used for > > building tool support for the software community? > > > > - Do you have any suggestion on what can be done to improve the change > > recommendation? > > > > You can visit our webpage to inspect the results in details: > > http://flosscoach.com/index.php/17-cochanges/67-cloudstack > > > > All the best, > > Igor Wiese > > Phd Candidate > > -- ================================= Igor Scaliante Wiese PhD Candidate - Computer Science @ IME/USP Faculty in Dept. of Computing at Universidade Tecnológica Federal do Paraná