I will let Bitergia (@Daniel Izquierdo) to respond to your questions re:
data ingestion and storage.

Thanks for pointing that I left the alias out, I've added it now and also
privacy@

On Tue, 2 Jun 2020 at 12:05, Christian Grobmeier <grobme...@apache.org>
wrote:

> Hello Griselda,
>
> Dirk will be on a very important $dayjob project the next weeks and
> announced that he is practically unavailable. I try to help and fill in as
> much as I can, if you don't mind.
>
> I understand a Bitergia is analyzing data. In the models mentioned below I
> see there are email addresses listed. E-Mail addresses are at least
> considered personal information, so I would like to ask a few questions.
>
> I don't know where the data is coming from. Is this data collected by the
> ASF from some internal database and given to Bitergia? Or is this data
> collected from public website such as GitHub?
>
> Once the data is analysed what will happen with that data?
>
> I am sorry if my questions feel painful to you or dump, I am just trying
> to understand.
>
> Kind regards,
> Christian
>
> PS: you mentioned to include D&I dev list, but I didn't see it in CC.
> Also, you may consider to add privacy@
>
>
> On Tue, Jun 2, 2020, at 00:42, Griselda Cuevas wrote:
>
> cc'ing the D&I dev list for transparency and Bitergia team to answer
> questions
>
> David, Dirk -
>
> I'm reaching out to get your consent to proceed with the quantitative
> analysis that is part of the D&I Research the D&I committee is driving.
>
> Bitergia, our consultant, is planning to analyze community activity via
> their technology. Here are two sources they shared with me to get more
> context:
> * GrimoireLab Tutorial: https://chaoss.github.io/grimoirelab-tutorial/,
> so you can see how this works, the infra under all of this, data sources
> supported, data mining processes, etc.
> * And regarding to the final curated data, the following folder contains
> all of the data models we're using:
> https://github.com/chaoss/grimoirelab-elk/tree/master/schema
>
> The projects we're looking to analyze are:
> Airflow
> Beam
> Cassandra
> CouchDB
> Flink
> Hadoop
> HTTPD
> Lucene
> NetBeans
> OpenOffice
> Spark
> Tomcat
>
> My understanding is that data of this sort is already being analysed by
> other vendors and systems.
>
> Do you have any concerns on this or could we proceed?
>
> Thanks,
> G
>
>
>
>

Reply via email to