Re: Data inconsistency in projects.apache.org

2020-03-28 Thread Rich Bowen
On Fri, Mar 27, 2020, 16:16 sebb wrote: > > > So, while to me that seems like an obvious and enormous improvement, my > > understanding is that this was proposed before and someone (I understood > > it was you?) vetoed the change. So I'm a teensy bit confused. > > Not me. > I have always been in

Re: Data inconsistency in projects.apache.org

2020-03-28 Thread Hervé BOUTEMY
yes, I'm convinced some data can be extracted automatically but I also know that some data can't for example: - for committees with multiple projects, like https://projects.apache.org/projects.html?committee#commons - for projects still using svn - the definition of languages and categories and t

Re: Data inconsistency in projects.apache.org

2020-03-27 Thread Dave Fisher
See http://incubator.apache.org/clutch/tuweni The repositories are actual and are updated from gitbox.apache.org/repositories.json The releases are from dist.apache.org/repos/dist/release and are exactly what is available. Other items on the page are either from a podling status file or other

Re: Data inconsistency in projects.apache.org

2020-03-27 Thread Hervé BOUTEMY
there are many more parts, see some examples of human-readable output: https://projects.apache.org/project.html?accumulo https://projects.apache.org/project.html?calcite Le vendredi 27 mars 2020, 21:44:56 CET Dave Fisher a écrit : > metadata for project releases is discoverable from the dist in sv

Re: Data inconsistency in projects.apache.org

2020-03-27 Thread Hervé BOUTEMY
Le vendredi 27 mars 2020, 21:19:56 CET sebb a écrit : > > > That way, over time, we'd eventually have all of those files in one > > > place, making them easier to find and update. > > > > find = 2 files (1 for committees, 1 for projects) > > May be more than one for projects. > e.g. Commons. I w

Re: Data inconsistency in projects.apache.org

2020-03-27 Thread Dave Fisher
metadata for project releases is discoverable from the dist in svn. It is already done for podlings in the Incubator in the clutch analysis. It is python. I can provide some help late next week. Sent from my iPhone > On Mar 27, 2020, at 1:20 PM, sebb wrote: > > On Fri, 27 Mar 2020 at 20:01,

Re: Data inconsistency in projects.apache.org

2020-03-27 Thread sebb
On Fri, 27 Mar 2020 at 20:01, Hervé BOUTEMY wrote: > > Le vendredi 27 mars 2020 20:29:14 CET, vous avez écrit : > > On 3/27/20 3:07 PM, Hervé BOUTEMY wrote: > > > It's good to see some interest back on DOAP files content ad organisation, > > > now that the projects.apache.org rendering makes them

Re: Data inconsistency in projects.apache.org

2020-03-27 Thread sebb
On Fri, 27 Mar 2020 at 18:04, Rich Bowen wrote: > > > > On 3/27/20 1:13 PM, sebb wrote: > > On Fri, 27 Mar 2020 at 13:44, Rich Bowen wrote: > >> there are also lines that look like: > >> > >> http://flex.apache.org/pmc_Flex.rdf > >> > >> (4 of them, for whatever that's worth - flex, ofbiz, pl

Re: Data inconsistency in projects.apache.org

2020-03-27 Thread Rich Bowen
On 3/27/20 4:01 PM, Hervé BOUTEMY wrote: my point about "PMC RDF files" vs "projects DOAP files" is not a question of format, but a question of amount of data and who would have real knowledge to update content: - PMC RDF files are very light, rarely updated, and contain data that are really

Re: Data inconsistency in projects.apache.org

2020-03-27 Thread Hervé BOUTEMY
Le vendredi 27 mars 2020, 20:11:33 CET Rich Bowen a écrit : > For context, I'm trying to address Sally's complaint that the data on > projects.a.o is inconsistent, out of date, and wonky. yes, I like this objective > I am very willing > to reach out to various projects about data updates (and am d

Re: Data inconsistency in projects.apache.org

2020-03-27 Thread Hervé BOUTEMY
Le vendredi 27 mars 2020 20:29:14 CET, vous avez écrit : > On 3/27/20 3:07 PM, Hervé BOUTEMY wrote: > > It's good to see some interest back on DOAP files content ad organisation, > > now that the projects.apache.org rendering makes them really useful: a > > few years ago, trying to open any discuss

Re: Data inconsistency in projects.apache.org

2020-03-27 Thread Rich Bowen
On 3/27/20 3:07 PM, Hervé BOUTEMY wrote: It's good to see some interest back on DOAP files content ad organisation, now that the projects.apache.org rendering makes them really useful: a few years ago, trying to open any discussion on that was deemed to failure. But any change is hard, since e

Re: Data inconsistency in projects.apache.org

2020-03-27 Thread Rich Bowen
On 3/27/20 3:07 PM, Hervé BOUTEMY wrote: It's good to see some interest back on DOAP files content ad organisation, now that the projects.apache.org rendering makes them really useful: a few years ago, trying to open any discussion on that was deemed to failure. But any change is hard, since e

Re: Data inconsistency in projects.apache.org

2020-03-27 Thread Rich Bowen
On 3/27/20 2:45 PM, Hervé BOUTEMY wrote: please start by reading the human-oriented explanation: https://projects.apache.org/about.html this should ease the deep dive into data behind the recurring "Committees vs Projects" discussion Thanks. That is indeed where I started. I think where I g

Re: Data inconsistency in projects.apache.org

2020-03-27 Thread Hervé BOUTEMY
Le vendredi 27 mars 2020, 19:04:28 CET Rich Bowen a écrit : > > If any changes are made, I strongly recommend centralising the data files. > > DOAP files maintained in project data areas often get moved, and the > > project forgets to update the entry in projects.xml > > Also, sometimes edits to DO

Re: Data inconsistency in projects.apache.org

2020-03-27 Thread Hervé BOUTEMY
please start by reading the human-oriented explanation: https://projects.apache.org/about.html this should ease the deep dive into data behind the recurring "Committees vs Projects" discussion Regards, Hervé Le vendredi 27 mars 2020, 14:43:52 CET Rich Bowen a écrit : > I'm trying to understand

Re: Data inconsistency in projects.apache.org

2020-03-27 Thread Rich Bowen
On 3/27/20 1:13 PM, sebb wrote: On Fri, 27 Mar 2020 at 13:44, Rich Bowen wrote: there are also lines that look like: http://flex.apache.org/pmc_Flex.rdf (4 of them, for whatever that's worth - flex, ofbiz, plc4x, and tez) Is that correct? Or is that not how the data is supposed to be

Re: Data inconsistency in projects.apache.org

2020-03-27 Thread sebb
On Fri, 27 Mar 2020 at 13:44, Rich Bowen wrote: > > I'm trying to understand the twisty maze of data sources that fuel > projects.apache.org and either I'm confused, or there's some > inconsistency in how this all fits together. > > I'll start with just one data source for now, so that I don't mud