Hi Andreas

Lets think about some better fine tuning.  "NOT LIKE '%salsa%'" might
catch also Vcs URLs that are intentionally somewhere else.  While I'd
love to see all packages on Salsa, it might be sensible to start with
packages that are unintentionally not in Salsa so

udd=> SELECT COUNT(DISTINCT source) FROM sources WHERE release = 'sid' AND 
(vcs_url IS NULL OR vcs_url like '%alioth%' OR vcs_url like '%git.debian.org%' OR 
vcs_url like '%svn.debian.org%') ;
  count
-------
   2213

That might make a real challenge to bring that number below 2000 until
end of my term.  Any help to approach this is welcome.

Well, let's look at some of these other d.o URLs.

- Not our alioth: There are 16 vcs URLs in there that have 'alioth'
  in them but aren't alioth.debian.org; they are git hosted but not
  on Debian infrastructure (and perhaps not in a place that facilitates
  collaboration in the way being discussed)

- dgit.debian.org: There are 30 in there that are dgit.debian.org.
  That surprised me, maybe I don't know enough about dgit.

- git.debian.org: There are 146 with git.debian.org - none of these VCS
  URLs work any more

- svn.debian.org: !4 list svn.d.o but like git.d.o that's dead. svn.d.o
  doesn't even exist as a hostname any more.

There's 161 packages in sid with old d.o URLs pointing to alioth. There's a reasonable chance that a good portion of them are also not maintained.

 - 11% of them list their maintainer as Debian QA Group
 - 13% of them have a current O bug (another 1 with an RFA)
 - who knows how many are otherwise abandoned with MIA maintainers or
   maintainers who have just moved on to other things

There was a recent discussion about what to do with VCSes for orphaned packages. Maybe if it doesn't exist on salsa, it's worth creating one in the salsa.d.o/debian/ namespace as part of doing the QA upload? (gbp import-dscs --debsnap) That would be a good outcome and a good little project for someone...

The vast majority of these packages have seen post-alioth uploads but with the broken Vcs fields still in place. That's perhaps offering the opposite of collaborative development? The question is whether the repo has actually moved to salsa but d/control hasn't been updated, or whether the repo has just vanished. An MBF that the VCS fields are out of date is easy, but checking and fixing is likely manual work.

year | count
-----+-------
2011 |     1
2012 |     4
2013 |     3
2014 |     4
2015 |     1
2016 |     1
2017 |     2
2018 |     2  (salsa.d.o general availability)
2019 |     1
2020 |    13
2021 |    95
2022 |    20
2023 |     7
2024 |     6
2025 |     1


I noticed that some teams have some lintian tags checking this from a team policy perspective - doing this more broadly for other teams would help provide teams with visibility via lintian.d.o reports.

lintian-explain-tags -t team/pkg-perl/vcs/no-git \
 team/pkg-perl/vcs/no-team-url

(I accidentally found 2 python-team packages without Vcs URLs yesterday - the repos were on salsa, just not listed in d/control)

Over half of these old alioth URLs can be addressed by Teams doing some data normalisation and uploads:

       maintainer_name         | count
-------------------------------+-------
Debian Perl Group              |    72
Debian Java Maintainers        |    10
Debian X Strike Force          |     7
Debian XML/SGML Group          |     4
Debian Science Maintainers     |     3
Debian CLI Applications Team   |     2
Debian Ruby Extras Maintainers |     1
Debian Javascript Maintainers  |     1
Debian Telepathy maintainers   |     1
Debian Fonts Task Force        |     1
Debian CLI Libraries Team      |     1
Debian-IN Team                 |     1
Debichem Team                  |     1
NeuroDebian Team               |     1
The Debian Lua Team            |     1


So in terms of where to start... perhaps there's a couple of teams that would like to do some data cleansing?

regards
Stuart



SELECT
    s.source, date, vcs_url
FROM
    sources AS s
    JOIN upload_history AS h
    ON s.source = h.source AND s.version = h.version
WHERE
    release = 'sid' AND
    vcs_url ~ '/(git|svn|alioth).debian.org'
ORDER BY
    date DESC;



SELECT
    DATE_PART('year', date) AS year,
    COUNT(*)
FROM
    sources AS s
    JOIN upload_history AS h
    ON s.source = h.source AND s.version = h.version
WHERE
    release = 'sid'
    AND vcs_url ~ '/(git|svn|alioth).debian.org'
GROUP BY
    year
ORDER BY
    year ASC;



SELECT
    maintainer, COUNT(*)
FROM sources
WHERE
    release = 'sid'
    AND vcs_url ~ '/(git|svn|alioth).debian.org'
    AND maintainer ~ '(team|group|lists)'
GROUP BY
    maintainer
ORDER BY
    count DESC;


--
Stuart Prescott   http://www.nanonanonano.net/ stu...@nanonanonano.net
Debian Developer  http://www.debian.org/       stu...@debian.org
GPG fingerprint   90E2 D2C1 AD14 6A1B 7EBB 891D BBC1 7EBB 1396 F2F7

Reply via email to