Hi Daan, I have a set of JIRA issue IDs from the authors (and can generate an updated list). These issues are Test Bugs. However, what is more interesting are the causes of these False Positives. To determine what category the Test Bug belongs to requires looking into the JIRA issue, comments, patches, etc.
I only have 3 months to finish my Thesis. If I want to categorise each issue myself I question the validity of my research for I can not gain extensive knowledge and understanding of the code in my limited time period. The original authors did this categorisation process themselves, investing 400 hours categorising roughly 550 Test Bugs (among many Apache Projects). However, I believe that with knowledge of the project, and an initial list of categories (constructing the categories took time as well!) this process can be cut in time significantly. I believe it should take a developer of the cloudstack project not more than 15 minutes to categorise one issue. If I have a set of categorised issues I can then look into common patterns among these issues. For that I need as much data I can get. I hope this clarifies. Please let me know if you (and hopefully more Cloudstack devs!) are willing to categorise a couple of issues. A few issues per week per person will come a long way :) Regards, Kevin > On 07 Apr 2016, at 20:56, Daan Hoogland <daan.hoogl...@gmail.com> wrote: > > Kevin, what you intent to do sound promising. I think you are on your own > in collecting data but by the sound of it you allready have a list of 110 > name/question pairs, do you? If not, how do you intent to do the collection > of the data you require? > > On Thu, Apr 7, 2016 at 4:04 PM, Kevin van den Bekerom < > k.vandenbeke...@sig.eu> wrote: > >> Dear Developers of the Apache Accumulo project, >> >> >> >> My name is Kevin van den Bekerom and I am currently doing my Master's >> research on the topic of false alarms in test code. I would like to ask the >> input of the Cloudstack development team categorizing test code bugs. >> >> >> >> My research is based on a recent paper by Arash et al. ( >> http://salt.ece.ubc.ca/publications/docs/icsme15.pdf). They conducted an >> empirical study, categorizing "test code bugs" in Apache software projects, >> e.g. semantic, flaky, environmental, etc. A "test code bug" is a failing >> test, where the System Under Test is correct, but the test code is >> incorrect. To identify test code bugs they looked at issues in JIRA, and >> checked if the fixing commit was only in the test code. Only fixed issues >> were counted and categorised. >> >> >> >> My goal is to replicate their results using a different approach, i.e. ask >> developers that were involved in the issue and/or fix how they would >> categorize it. For the Cloudstack project they counted 111 test code bugs. >> Insight into false positives can therefore be very relevant for your >> project. Note that they only sampled a number of identified test code bugs >> for individual inspection (30 for the Accumulo project). >> >> >> I would like to ask the Cloudstack team’s participation in categorizing the >> various test code bugs. I will provide a list of JIRA IDs which are >> identified as test code bugs and an initial list of categories to aid in >> the categorization process. In my belief, the developers that worked on the >> issue are the one's that are most capable of categorizing the issue. Please >> let me know if this project looks interesting to you and if you are willing >> to help me out. >> >> >> >> As a next step I will look for common patterns in identified test code bugs >> and my aim is to extent static source code analysis techniques to be also >> suited to find test code bugs. I am of course very happy to share my >> findings with the team. >> >> >> >> Hope to hear from you! >> >> >> >> With kind regards, >> >> Kevin van den Bekerom >> -- >> *Kevin van den Bekerom* | Intern >> >> +31 6 21 33 93 85 | kvandenbeke...@sig.eu >> Software Improvement Group | www.sig.eu >> > > > > -- > Daan