>
> it is difficult to find files to change in a
> specific issue.

I guess this can be a useful reminder "you might also want to update file
Y". Maybe richer insights can be found on method level.

---
Zhe Zhang

On Mon, Dec 14, 2015 at 7:07 PM, Igor Wiese <igor.wi...@gmail.com> wrote:

> Hi Zhe! Thanks for your answer.
>
> In fact, we are predicting the "co-change" based on contextual
> information collected from issues, commits and developers
> communication. Considering the files that i described in the example
> ("/ipc/Client.java" and
> "security/SecurityUtil.java") I collected metrics in each issue and
> commit from Client.java to predict when Client.java is prone to change
> with SecurityUtil.java.
>
> We are thinking to build a webservice to help newcomers during their
> first contributions. Our research group interviewed some newcomers and
> they told us that it is difficult to find files to change in a
> specific issue. We can recommend files to be checked.
>
> From the committer perspective, we could help in code review tasks.
>
> What do you think?
>
> Our idea
>
> 2015-12-14 22:16 GMT-02:00 Zhe Zhang <z...@apache.org>:
> > Hi Igor,
> >
> > It's an interesting direction to study tickets/commits in the Hadoop
> > community.
> >
> > A research group from Univ. Wisconsin did a similar study on Linux file
> > systems and I found it quite insightful:
> > http://research.cs.wisc.edu/wind/Publications/fsstudy-tos14.pdf
> >
> > For your results, could you elaborate why you picked "co-change" as the
> > metric, and how to improve software tools from the "co-change"
> predictions?
> >
> > Thanks,
> > Zhe
> >
> > On Mon, Dec 14, 2015 at 3:01 PM, Igor Wiese <igor.wi...@gmail.com>
> wrote:
> >
> >> Hi, Hadoop Community.
> >>
> >> My name is Igor Wiese, phd Student from Brazil. I sent an email a week
> >> ago about my research. We received some visit to inspect the results
> >> but any feedback was provided.
> >>
> >> I am investigating two important questions: What makes two files
> >> change together? Can we predict when they are going to co-change
> >> again?
> >>
> >> I've tried to investigate this question on the Hadoop project. I've
> >> collected data from issue reports, discussions and commits and using
> >> some machine learning techniques to build a prediction model.
> >>
> >>
> >> I collected a total of 950 commits in which a pair of files changed
> >> together and could correctly predict 47% commits. These were the most
> >> useful information for predicting co-changes of files:
> >>
> >> - sum of number of lines of code added, modified and removed,
> >>
> >> - number of words used to describe and discuss the issues,
> >>
> >> - median value of closeness, a social network measure obtained from
> >> issue comments,
> >>
> >> - median value of constraint, a social network measure obtained from
> >> issue comments, and
> >>
> >> - median value of hierarchy, a social network measure obtained from
> >> issue comments.
> >>
> >> To illustrate, consider the following example from our analysis. For
> >> release 0.22, the files "/ipc/Client.java" and
> >> "security/SecurityUtil.java" changed together in 3 commits. In another
> >> 1 commit, only the first file changed, but not the second. Collecting
> >> contextual information for each commit made to first file in the
> >> previous release, we were able to predict 2 commits in which both
> >> files changed together in release 0.22, and we only issued 1 wrong
> >> prediction. For this pair of files, the most important contextual
> >> information were the social network metrics (density, hierarchy,
> >> efficiency) obtained from issue comments.
> >>
> >>
> >> - Do these results surprise you? Can you think in any explanation for
> >> the results?
> >>
> >> - Do you think that our rate of prediction is good enough to be used
> >> for building tool support for the software community?
> >>
> >> - Do you have any suggestion on what can be done to improve the change
> >> recommendation?
> >>
> >> You can visit our webpage to inspect the results in details:
> >> http://flosscoach.com/index.php/17-cochanges/70-hadoop
> >>
> >> All the best,
> >> Igor Wiese
> >>
> >> Phd Candidate
> >>
> >> --
> >> =================================
> >> Igor Scaliante Wiese
> >> PhD Candidate - Computer Science @ IME/USP
> >> Faculty in Dept. of Computing at Universidade Tecnológica Federal do
> Paraná
> >>
>
>
>
> --
> =================================
> Igor Scaliante Wiese
> PhD Candidate - Computer Science @ IME/USP
> Faculty in Dept. of Computing at Universidade Tecnológica Federal do Paraná
>

Reply via email to