On Tue, 24 Mar 2009 00:43:48 -0700
Steve Langasek <vor...@debian.org> wrote:

> > I have been reading this discussion a bit and I've been wondering what
> > use-case you actually have for machine-readable debian/copyright files.
> 
>   This is quite different than having the *license terms* recorded in a
>   machine-parseable format, which is potentially useful in lots of ways;

The format would still need to match source code licence terms with
compiled objects that could include a variety of source files and have
to deal with changes in the linkages within a package that can change
during the lifetime of a package? There is no permanent/reliable link
between the licence of a source code file and the licence of a specific
compiled binary from the final package, only with the collection of
source code and the collection of binaries.

Even within a single .deb it might not be possible to identify exactly
which licences apply if the source package builds lots of variant
binary packages. Also, runtime information can be checked for some
simple .deb packages on the basis of all that the package contains but
development information about which licence applies to the source code
someone copies and pastes from the source package are entirely
separate. Other issues affect packages that build twice with different
options to ./configure that may or may not omit certain source code
files from one or other build.

If I have a modular source package that can enable or disable various
build options and various components and some of those components have
differing licences, it becomes very hard to track subtle changes
between builds that may or may not result in source code under licence
A being compiled into binary bar in some circumstances but not in
others. Individual versions of a package must build the same way on
each architecture but subsequent versions can change, making it hard
for the maintainer to track what is going on.

If such a modular source package (foo) builds a number of different
binary packages, how is the checker to know whether binary package bar,
linking against libfoo-with-baz is any different to linking against
libfoo-without-baz other than relying on the package names? Licence
incompatibilities between source packages are not the issue, AFAICT, if
only because the offending source code might not actually be being
compiled; incompatible licences between binaries linked at runtime are
the problem.

I'm not sure that any proposed format of debian/copyright would allow a
checker to be at all certain that a particular .so from package foo has
a compatible licence with a particular .so from package bar where both
foo and bar include multiple libraries, multiple binaries and multiple
linkages at build time. (debian/libfoo.copyright is a separate idea
with different problems, see later.)

Yes, the checker might be able to say that source package foo contains
code under licence A and source package bar contains code under licence
B and that a certain conflict might result but whether that is a real
problem or not still depends on exactly how the relevant code is
compiled and linked - something that can change much more frequently
than the licences themselves.

This is the problem with licensecheck - it relies on the source and
cannot hope to understand how the source becomes a binary.

I fear that such a checker would be very misleading and cause
unnecessary work dealing with the 'bugs' that could result.

(relocating from the end of Steve's message)
> Well, aside from the section header, nothing in Debian Policy actually says
> you need to have a per-source debian/copyright file; and you certainly can
> have separate per-binary copyright files in your package that get installed
> individually if you choose, there's nothing that prevents you from doing
> that even though it's clearly not common practice today.

Can't help thinking that the packages that would benefit from
debian/libfoo.copyright are the very ones where maintaining that file
will make the idea rather unappealing due to the issues above. However,
it is something I hadn't considered and there could be some mileage in
that for some packages, especially those with different licences for
the API documentation. It could make the main debian/copyright file
much cleaner and easier to read for a small number of packages.

I'm still not convinced that machine-parseable formats are genuinely
useful or maintainable and I feel that machine-parseable
requirements inevitably impair human readability of copyright files.
That's not a win, AFAICT.

> Please don't reply with arguments why this isn't enough reason to make
> maintainers do extra work.  I'm not trying to make any maintainers do extra
> work; I'm pointing out reasons why having a consistent and machine-parseable
> copyright format is useful, which is the question that was asked.  That
> benefit is there even if only a subset of maintainers opt to use a
> machine-parseable format; but given that there is interest in having such a
> format, it's important that we come to some agreement on what that format
> should be, so that we don't have a dozen incompatible formats running
> around.

Would you say that debian/libfoo.copyright is a pre-requisite for such
checkers to be useful on all but the simplest of packages? How are
complex packages going to maintain such files?

Is it really useful to have only a subset of packages using the format?
Isn't only going to be the small packages that have no particular
licence problems that would adopt it because it's almost trivial to do
so? Unless maintainers of complex packages or packages where licence
problems are likely (those that need exceptions added to the GPL etc.)
can implement the format cleanly, is there really any benefit?

There are elements of the format that aid human readability but making
the format completely machine-parseable means making allowances for so
many ifs and buts that the copyright files become only readable by
machine.

> That's what we should be working on.  This thread with people refusing to
> use a parseable format for debian/copyright, and arguing about whether using
> the format does or does not provide assurances about the copyright status of
> a work, is all an irrelevant (and irritating) distraction.

Actually, having per-binary-package copyright could help with a lot of
packages, merely by making each copyright file smaller - as long as the
package has clear licence divisions. e.g. a package that is all GPL
with a GFDL documentation package would have a much simpler copyright
setup this way.

I quite like that idea because it potentially means that individual
copyright files become smaller (easier to review) and the .deb only
contains copyright information that is relevant to that single
binary .deb which would assist in making /usr/share/doc/ smaller for
the vast majority of users who don't install every binary package from
a particular source package. (That's always handy for those interested
in keeping installations small.) There really isn't any need for the
copyright details of libfoo-bar to be installed alongside libfoo (or
more likely, libfoo-doc alongside libfoo), let alone having the same
file installed for both libfoo-bar AND libfoo so that users who do
install more than one package from a particular source package get
multiple, identical, copies of debian/copyright. (gcc tries to get
around this with a -base package but that causes different problems.)

The format of the copyright files doesn't matter from that perspective.

I might try it for one of my own (smaller) packages.

> Once there's a stable spec that has a measure of consensus surrounding it,
> instead of a wiki page that someone takes the liberty of rewriting every
> month or two, that's when I would expect to see adoption of the format by
> more folks writing tools.
> 
> > BTW, the use-case where you don't want to install FDL content and have
> > some way for apt to warn you before doing so won't be solved by a new
> > format because debian/copyright is written at the source-level and not
> > on the binary package level (think -doc packages that have FDL stuff and
> > -bin packages that have other-licensed stuff). (not that I've given this
> > too much thought)

Which is why debian/libfoo-doc.copyright becomes relevant.
debian/libfoo1.copyright might not be as useful but where there is a
clear dividing line between the licence for the code and the licence
for the documentation generated from the code, a separate copyright
file could be good, regardless of the format. It is much more difficult
to be certain of where the dividing line exists between
$(top_srcdir)/src and $(top_srcdir)/lib, especially when that line can
shift according to build options or new versions.

-- 


Neil Williams
=============
http://www.data-freedom.org/
http://www.nosoftwarepatents.com/
http://www.linux.codehelp.co.uk/

Attachment: pgpWmQNAkaBug.pgp
Description: PGP signature

Reply via email to