I will let Bitergia (@Daniel Izquierdo) to respond to your questions re: data ingestion and storage.
Thanks for pointing that I left the alias out, I've added it now and also privacy@ On Tue, 2 Jun 2020 at 12:05, Christian Grobmeier <grobme...@apache.org> wrote: > Hello Griselda, > > Dirk will be on a very important $dayjob project the next weeks and > announced that he is practically unavailable. I try to help and fill in as > much as I can, if you don't mind. > > I understand a Bitergia is analyzing data. In the models mentioned below I > see there are email addresses listed. E-Mail addresses are at least > considered personal information, so I would like to ask a few questions. > > I don't know where the data is coming from. Is this data collected by the > ASF from some internal database and given to Bitergia? Or is this data > collected from public website such as GitHub? > > Once the data is analysed what will happen with that data? > > I am sorry if my questions feel painful to you or dump, I am just trying > to understand. > > Kind regards, > Christian > > PS: you mentioned to include D&I dev list, but I didn't see it in CC. > Also, you may consider to add privacy@ > > > On Tue, Jun 2, 2020, at 00:42, Griselda Cuevas wrote: > > cc'ing the D&I dev list for transparency and Bitergia team to answer > questions > > David, Dirk - > > I'm reaching out to get your consent to proceed with the quantitative > analysis that is part of the D&I Research the D&I committee is driving. > > Bitergia, our consultant, is planning to analyze community activity via > their technology. Here are two sources they shared with me to get more > context: > * GrimoireLab Tutorial: https://chaoss.github.io/grimoirelab-tutorial/, > so you can see how this works, the infra under all of this, data sources > supported, data mining processes, etc. > * And regarding to the final curated data, the following folder contains > all of the data models we're using: > https://github.com/chaoss/grimoirelab-elk/tree/master/schema > > The projects we're looking to analyze are: > Airflow > Beam > Cassandra > CouchDB > Flink > Hadoop > HTTPD > Lucene > NetBeans > OpenOffice > Spark > Tomcat > > My understanding is that data of this sort is already being analysed by > other vendors and systems. > > Do you have any concerns on this or could we proceed? > > Thanks, > G > > > >