> > it is difficult to find files to change in a > specific issue.
I guess this can be a useful reminder "you might also want to update file Y". Maybe richer insights can be found on method level. --- Zhe Zhang On Mon, Dec 14, 2015 at 7:07 PM, Igor Wiese <igor.wi...@gmail.com> wrote: > Hi Zhe! Thanks for your answer. > > In fact, we are predicting the "co-change" based on contextual > information collected from issues, commits and developers > communication. Considering the files that i described in the example > ("/ipc/Client.java" and > "security/SecurityUtil.java") I collected metrics in each issue and > commit from Client.java to predict when Client.java is prone to change > with SecurityUtil.java. > > We are thinking to build a webservice to help newcomers during their > first contributions. Our research group interviewed some newcomers and > they told us that it is difficult to find files to change in a > specific issue. We can recommend files to be checked. > > From the committer perspective, we could help in code review tasks. > > What do you think? > > Our idea > > 2015-12-14 22:16 GMT-02:00 Zhe Zhang <z...@apache.org>: > > Hi Igor, > > > > It's an interesting direction to study tickets/commits in the Hadoop > > community. > > > > A research group from Univ. Wisconsin did a similar study on Linux file > > systems and I found it quite insightful: > > http://research.cs.wisc.edu/wind/Publications/fsstudy-tos14.pdf > > > > For your results, could you elaborate why you picked "co-change" as the > > metric, and how to improve software tools from the "co-change" > predictions? > > > > Thanks, > > Zhe > > > > On Mon, Dec 14, 2015 at 3:01 PM, Igor Wiese <igor.wi...@gmail.com> > wrote: > > > >> Hi, Hadoop Community. > >> > >> My name is Igor Wiese, phd Student from Brazil. I sent an email a week > >> ago about my research. We received some visit to inspect the results > >> but any feedback was provided. > >> > >> I am investigating two important questions: What makes two files > >> change together? Can we predict when they are going to co-change > >> again? > >> > >> I've tried to investigate this question on the Hadoop project. I've > >> collected data from issue reports, discussions and commits and using > >> some machine learning techniques to build a prediction model. > >> > >> > >> I collected a total of 950 commits in which a pair of files changed > >> together and could correctly predict 47% commits. These were the most > >> useful information for predicting co-changes of files: > >> > >> - sum of number of lines of code added, modified and removed, > >> > >> - number of words used to describe and discuss the issues, > >> > >> - median value of closeness, a social network measure obtained from > >> issue comments, > >> > >> - median value of constraint, a social network measure obtained from > >> issue comments, and > >> > >> - median value of hierarchy, a social network measure obtained from > >> issue comments. > >> > >> To illustrate, consider the following example from our analysis. For > >> release 0.22, the files "/ipc/Client.java" and > >> "security/SecurityUtil.java" changed together in 3 commits. In another > >> 1 commit, only the first file changed, but not the second. Collecting > >> contextual information for each commit made to first file in the > >> previous release, we were able to predict 2 commits in which both > >> files changed together in release 0.22, and we only issued 1 wrong > >> prediction. For this pair of files, the most important contextual > >> information were the social network metrics (density, hierarchy, > >> efficiency) obtained from issue comments. > >> > >> > >> - Do these results surprise you? Can you think in any explanation for > >> the results? > >> > >> - Do you think that our rate of prediction is good enough to be used > >> for building tool support for the software community? > >> > >> - Do you have any suggestion on what can be done to improve the change > >> recommendation? > >> > >> You can visit our webpage to inspect the results in details: > >> http://flosscoach.com/index.php/17-cochanges/70-hadoop > >> > >> All the best, > >> Igor Wiese > >> > >> Phd Candidate > >> > >> -- > >> ================================= > >> Igor Scaliante Wiese > >> PhD Candidate - Computer Science @ IME/USP > >> Faculty in Dept. of Computing at Universidade Tecnológica Federal do > Paraná > >> > > > > -- > ================================= > Igor Scaliante Wiese > PhD Candidate - Computer Science @ IME/USP > Faculty in Dept. of Computing at Universidade Tecnológica Federal do Paraná >