Hi Isabella

Thanks very much for reaching out to me.  It's great to hear that you are interested in the Apache Software Foundation (ASF) and doing some research on the the potential lifecycle of incubating projects. I think that the topics that you are looking to gather information on cover a few areas within the ASF. While Community Development is a general umbrella covering all Apache communities, we do have specific areas that are focussed specifically on D&I and Apache Incubator itself.

Our Apache Incubator community oversees the whole Apache incubation process while  D&I community has been instrumental in performing the latest survey of diversity within the Apache communities and may be able to give you a better indication of what diversity information we have and can share.

In the meantime I will try to respond inline to your various points below.

A key tool we use to gather statistics and metrics for all Apache projects is another Apache project called Apache Kibble which collects contribution and statistics information on incubating projects too so maybe take a look at.


On 2020-11-02 16:10, Isabella Ferreira wrote:
Dear Sharan Foga,

My name is Isabella Ferreira and I was in your presentation at CHAOSScon Europe 2020. Based on your presentation, I saw that you are involved in several initiatives to encourage diversity within the ASF.

We, a group of Canadian and Dutch software engineering researchers, are interested in understanding why some projects joining Apache incubator grow and succeed, and others fail. Based on this study, our eventual goal is to formulate recommendations for projects considering to join Apache in terms of expectations and best practices. We aim to share our findings with the Apache community as well as software practitioners and researchers.

So far we have manually classified the incubator proposals of 292 projects to understand their motivation. We have found that these are the top-5 reasons for joining the Apache incubator:

1.
    Community building
2.
    Community diversity
3.
    Follow an established development process (such as the "Apache Way")
4.
    Increase user base
5.
    Expected collaboration with other projects


As the next step, we would like to evaluate to what extent joining the Apache ecosystem has enabled projects to achieve their goals. In particular, we are interested in questions like:

 *
    Did the number of organizations contributing to Apache projects
    increase compared to before joining the Apache incubator?


We have been using Apache Kibble to generate statistics for all our projects and we don't currently track track organisational affiliation properly but there have been discussions about ways to improve and include it.

For projects coming into Apache Incubator, I believe some organisational affiliation is captured initially to ensure diversity of project affiliation and the lack of dependency on one specific company. As you mention sometimes a project enter incubation to grow their communities as they need to diversify to survive.

 *
    Did the geographical spread of contributions to Apache projects
    increase compared to before joining the Apache incubator?


I don't think Apache Kibble captures geographical location of contributions but it does capture the time and date of the contribution, if that is any help.

 *
    Did the gender diversity of contributions to Apache projects
    increase compared to before joining the Apache incubator?


We do have the contributor id but Apache Kibble doesn't specifically capture or report on this information. Perhaps our D&I community may be able to help you here with some relevant details from the last Apache Diversity survey.

While the GitHub and Subversion repositories of Apache projects provide information about the kind of contributions made (size, complexity, etc.), the information needed to address the above questions is not as readily available.

Hence, as the current VP of the Apache Community Development, we would like to have your thoughts on what would be the best way to obtain access to the above diversity data, without breaching any confidentiality concerns:

 *
    Is there a means to get access to Apache patch submitters’
    contributor agreements, for research purposes? If so, what is the
    process for this (e.g., NDAs to sign)?


Tha ASF site publishes publicly the list of people and companies that have signed an Individual or Corporate Contributor Licence Agreeement (ICLAs). If you are asking for access to the actual document signed, then no - this is not possible.

 *
    Alternatively, is there a way for us to provide R or Python
    analysis scripts that someone with data access could run on our
    behalf, as such only exposing aggregate data to us?
 *
    Another alternative would be to perform a series of interviews
    and/or a survey amongst Apache contributors, although the success
    would heavily rely on a large participation rate.


If your focus is on Incubator then by reaching out to them you maybe be able to gather enough survey participants. What sort of participation levels do you need to reach?


What are your thoughts on these points? Of course, we would be interested in organizing a virtual call to clarify our research objectives and/or questions.

I think you that have asked some interesting questions, but I am not sure that we have all the information available.  Some of the information you have asked for, we cannot give you. Perhaps it would be good to continue this discussion on our mailing list to explore a bit more what public data we have that could help with your research.

I have copied our VP Apache Incubator Justin McLean and our VP Apache Diversity & Inclusion Gris Cuevas who may also be able to respond with their comments or any additional details that could help you.

Thanks
Sharan


Kind regards,

Isabella Ferreira, Polytechnique Montréal, Canada
Bram Adams, Queen’s University, Canada
Alexander Serebrenik, Eindhoven University of Technology, The Netherlands
Nan Yang, Eindhoven University of Technology, The Netherlands


Reply via email to