On Tue, 08 Feb 2022 at 08:59:23 -0500, Scott Kitterman wrote:
> From my point of view, treating something like other common classes of RC 
> bugs 
> means that the project is producing tools and processes to make detection of 
> such bugs more automated to remove them from the archive, that developers are 
> actively looking for them, and that they are routinely fixed in the normal 
> course of Debian development.

I think part of the problem here might be that copyright information is
"social", not "technical": software authors can claim copyright and/or
authorship in various forms of human-readable, free-form text, which means
any automated detection is necessarily going to be imperfect, and as long
as our policy demands perfection, there will be a reluctance to automate
this (or at least a reluctance to say that we are automating it).

Another part of the problem is that licensing and copyright-information
bugs are not something that we are realistically going to find through
normal use of software: if GTK crashes when you print on a Tuesday, one
of our users will eventually notice, but if we have missed a copyright
holder, it's unlikely that anyone is going to notice that omission from
the list of around 400 potential copyright holders in
<https://tracker.debian.org/media/packages/g/gtk4/copyright-4.6.0ds1-3>
unless they repeat the time-consuming process of collecting possible
copyright claims from the source code (as the ftp team presumably do). I
have no idea how the maintainers of larger and more complicated packages
manage to do this, or how the ftp team manage to review larger and more
complicated packages in a finite time.

I think the copyright file is doing several things which are perhaps in
conflict:

* It lets consumers of packages know what restrictions apply to their
  use of a package
  - This requires *most* of the license information, although not
    necessarily all of it: for example if a package like Linux is licensed
    under a mixture of GPL, LGPL, BSD and MIT licenses, it's usually
    sufficient to be aware of the most restrictive of those licenses, in
    this case GPL
  - Having too much information, however, well-intentioned, actually works
    against this by making it harder to find what you need
  - I would argue that requiring the text of licenses like the CC family
    to be inlined into the copyright file works against this goal, by
    reducing the signal-to-noise ratio: if you are not familiar with a
    particular license, then obviously you will need to read its text
    to see what it means, but if you are looking at packages that have
    content under various semi-common licenses, you only need to read
    each license once
  - I would argue that requiring lists of copyright holders in the same
    file to be inlined into the copyright file also works against this
    goal, again by harming the signal-to-noise ratio

* It lets consumers of packages know that the package is DFSG-compliant
  - Same requirements as above

* It's a place to reproduce information that licenses require us to, like
  a comprehensive set of copyright notices (if our interpretation of the
  applicable licenses is that pointing to nearby source code and calling
  it extremely comprehensive accompanying documentation is insufficient)
  - In this role, it's essentially write-only: we're doing this because
    we have been required to do it, more than because it's practically
    useful, and I don't expect anyone to actually read this, except for
    the maintainer when collecting it and the ftp team when verifying
    that it has been collected
  - In another subthread, Stephan Lachnit suggests using the SPDX format
    for this write-only information, which I think might be intended as
    a way to eventually separate it from the other roles of d/copyright

* It gives authors due credit (which we are not *required* to do, but
  in previous discussions of d/copyright I've seen this cited as a reason
  why we *should* do this in order to be good citizens)
  - Note that collecting copyright holders is not necessarily actually
    helpful here, because that often means we are required to "credit"
    an employer, rather than mentioning the actual author
  - In a medium-sized package like GTK, it's not clear to me that a list of
    about 400 possible copyright holders is actually serving this purpose,
    because any individual contributor is lost in the noise

* It lets us meet our self-imposed rules
  - This is circular, so I'm inclined to disregard it when discussing what
    the rules should be: we should set rules because they help us to
    achieve a goal, rather than for the sake of having rules

* It lets the ftp team (or other interested reviewers) duplicate the
  info-collecting process to check that all of the above have been done
  - This is somewhat circular, because this is a way to support the other
    goals, not really a goal in its own right

* Are there other relevant goals that I've missed here?

I don't think conflating those goals and assuming they all need to be
satisfied by a single file is necessarily going to lead to meeting any
of those goals in an efficient way, let alone meeting all of them in
an efficient way.

    smcv

Reply via email to