Are you handling new files as well, or the links between sets of files (or
packages)? As an example, if a user creates a new API cmd, then he will
update the "commands.properties" file. Another example, if a VO file is
updated, then there will be a db migration file added as well.
Cool work,

On Thu, Dec 10, 2015 at 9:21 AM, Igor Wiese <igor.wi...@gmail.com> wrote:

> Hi Sebastien.
>
> We used only 141 commits because we needed data from the issues. As my
> assumption is related to the contextual information from Issues and Social
> aspects, we need to aggregate commits and Issues.
>
> First, I collected the issues from JIRA and then i tryed to aggregate the
> commits that explicit made mentions to an issue collected. I only also used
> closed issues to obtain the confidence that the code used to build my
> models have been merged and checked by the community.
>
> That is the weak point of my approach. I need the past data from the
> issues. Sometimes it is not available for past time.
> It is in my plan to use also data from github to make the dataset more
> complete.
>
> All the best,
>
> 2015-12-10 11:22 GMT-02:00 sebgoa <run...@gmail.com>:
>
> >
> > On Dec 10, 2015, at 12:31 AM, Igor Wiese <igor.wi...@gmail.com> wrote:
> >
> > > Hi, Cloudstack Community.
> > >
> > > My name is Igor Wiese, phd Student from Brazil. In my research, I am
> > > investigating two important questions: What makes two files change
> > > together? Can we predict when they are going to co-change again?
> > >
> > > I've tried to investigate this question on the Cloudstack project. I've
> > > collected data from issue reports, discussions and commits and using
> some
> > > machine learning techniques to build a prediction model.
> > >
> > > I collected a total of 141 commits in which a pair of files changed
> > > together and could correctly predict 60% commits.
> >
> >
> > Hi Igor, why 141 commits ? Is that the only commits you found with only a
> > pair for changes ?
> >
> > My gut feeling is that you could check the entire history of the
> > CloudStack repo (~5 years worth of data) and work on different type of
> > tuples.
> >
> > 141 commits seems like a really small dataset.
> >
> > -Sebastien
> >
> > > These were the most
> > > useful information for predicting co-changes of files:
> > >
> > > - sum of number of lines of code added, modified and removed,
> > >
> > > - number of words used to describe and discuss the issues,
> > >
> > > - number of comments in each issue,
> > >
> > > - median value of closeness, a social network measure obtained from
> issue
> > > comments, and
> > >
> > > - median value of constraint, a social network measure obtained from
> > issue
> > > comments.
> > >
> > > To illustrate, consider the following example from our analysis. For
> > > release 4.4, the files "cloud/hypervisor/XenServerGuru.java" and
> > > "cloud/hypervisor/guru/VMwareGuru.java " changed together in 3 commits.
> > In
> > > another 2 commits, only the first file changed, but not the second.
> > > Collecting contextual information for each commit made to first file in
> > the
> > > previous release (4.3), we were able to predict all 3 commits in which
> > both
> > > files changed together in release 4.4, and we only issued 0 false
> > > positives. For this pair of files, the most important contextual
> > > information was the number of lines of code added, removed and modified
> > in
> > > each commit,the number of comments in each issue, and social network
> > > measures (closeness, density, constraint, hierarchy) obtained from
> issue
> > > comments.
> > >
> > > - Do these results surprise you? Can you think in any explanation for
> the
> > > results?
> > >
> > > - Do you think that our rate of prediction is good enough to be used
> for
> > > building tool support for the software community?
> > >
> > > - Do you have any suggestion on what can be done to improve the change
> > > recommendation?
> > >
> > > You can visit our webpage to inspect the results in details:
> > > http://flosscoach.com/index.php/17-cochanges/67-cloudstack
> > >
> > > All the best,
> > > Igor Wiese
> > > Phd Candidate
> >
> >
>
>
> --
> =================================
> Igor Scaliante Wiese
> PhD Candidate - Computer Science @ IME/USP
> Faculty in Dept. of Computing at Universidade Tecnológica Federal do Paraná
>

Reply via email to