Given that DEP-5 is supposed to be about machine- readability, I thought it would be worthwhile trying to write something to parse the proposed format. Please find attached a short python script that I have written based on the current text of DEP-5 at dep.debian.net[1].
It's designed to be run from an unpacked and patched source package (or at least a source tree containing debian/copyright, which it attempts to parse). It will print out a list of each Files: stanza found in the copyright, followed by the list of files which it believes are matched by the stanza. It has proven useful to me: I found several bugs in a copyright file I'd written for a real live package, based on my misinterpretation of the current wording. Whilst writing this, I found the syntax chosen for the Files: field to be very awkward. Indeed my crude parser only handles a subset of the syntax so far (no escapes, no handling of quoted strings). Most of the examples given in DEP-5 containing the path character will not work, either, e.g. Files: debian/* Assuming they are passed into a find(1) invocation like so find . -path 'debian/*' (note the presence of the path separator and the wording about that in the text) they need to be prefixed with './', even if you omit '.' in the find execution (which itself is a GNUism iirc). Patch attached. I think I would much prefer using regular expressions here. For one thing I'm worried about variations in find(1) behaviours across platforms. For another, unless a parser calls find(1) (as I have, and it's expensive), trying to match its behaviour will imho be a lot more error prone than using your languages built-in regular expression library or pcre or whatever. I will try to cook a patch for comment. [1] (I need to re-read the older DEP-5 messages to understand the current maintainership situation: I see Steve remove the other drivers in that version, and Charles do the same in his git repo...) -- Jon Dowland
#!/usr/bin/python # a crude DEP-5 parser # Copyright (c) 2009 Jon Dowland <j...@debian.org> # Copying and distribution of this file, with or without modification, are # permitted in any medium without royalty provided the copyright notice and this # notice are preserved. # usage: run the script from within an unpacked source tarball with the debian # diff.gz applied on top (or at least, a DEP-5-syntax debian/copyright file # available) from email import parser from sys import exit from os import popen ############################################################################## ## step 1: handle/parse RFC822 superset # remove blank lines so the parser treats it all as an email header copyright = parser.Parser().parsestr( ''.join( filter(lambda x: "\n" != x, open("debian/copyright").readlines() ))) if len(copyright.keys()) < 1: print "parser didn't get any headers from the copyright file" exit(1) ############################################################################## ## step 2: interpret the headers and build a list of tuples ## (files, license, copyright) # DEP5 header. Format-Specification is required. Others are optional. valid = "Format-Specification Name Maintainer Source Disclaimer".split() header = dict([ [x,''] for x in valid]) files = "Files Copyright License".split() # first loop: handle the header for i in range(0,len(copyright.items())): key = copyright.keys()[i] # skip over x-Arbitrary: headers if key[0] == 'x': continue if key in valid: if header[key]: print "error: redefinition of '%s'." % key exit(1) header[key] = copyright.values()[i] continue # this marks the transition from the header onwards if key in files: if not header['Format-Specification']: print "error: Format-Specification must be defined " +\ "before the Files section" exit(1) break print "unrecognised key '%s'" % key exit(1) # second loop: looping through the main parts current = dict([ [x,''] for x in files]) tuples = [] # take a hash of Files/Copyright/License and split it up # into multiple ones based on the Files key # first rule: multiple items separated by commas # XXX: unhandled: escaped commas; quoted-strings # containing commas def append(tuples, current): for t in current['Files'].split(","): c = current.copy() c['Files'] = t.strip() tuples.append(c) for i in range(i,len(copyright.items())): key = copyright.keys()[i] # skip over x-Arbitrary: headers if key[0] == 'x': continue if key in files: # handle implicit 'Files: *' if 'Files' != key and not current['Files']: current['Files'] = '*' # new Files: stanza ends the last one elif 'Files' == key and current['Files']: for defn in ['License', 'Copyright']: if not current[defn]: print "error: missing %s line for Files: %s" \ % (defn, current['Files']) exit(1) append(tuples,current) current = dict([ [x,''] for x in files]) # new License or Copyright for existing Files: if current[key]: print "error: redefinition of '%s'. Missing 'Files' item?" % key print "line is %d, value is '%s'" % (i,copyright.values()[i]) exit(1) current[key] = copyright.values()[i] continue print "unrecognised key '%s'" % key exit(1) tuples.append(current) # DEP-5 states "If multiple Files declarations match the same file, then only # the last match counts.". This suggests no inheritance is possible between # stanzas. Thus, reversing the list means we can look for the *first* matching # stanza. tuples.reverse() ############################################################################## ## step 3: indicate mapping of stanzas to source files ## we run find(1) for each tuple to build up a list of files which match ## the Files: definition. We then run find(1) again on the source directory ## to obtain a list of all files, then compare results. # a list of [ (Files:, [matching files]) ] for each Files # populated with the list of files which match each Files: key matching = [] for t in tuples: nameorpath = 'name' if t['Files'].count('/') > 0: nameorpath = 'path' runme = "find . -type f -%s \"%s\" 2>/dev/null" % (nameorpath, t['Files']) matching.append( (t['Files'], [ x.strip() for x in popen(runme).readlines() ]) ) # { Files: => [matching files] }, this time populated by # comparing every file against each stanza in turn results = dict([ [x['Files'],[]] for x in tuples ]) results['no match'] = [] for fname in [x.strip() for x in popen('find . -type f').readlines()]: res = 'no match' for pair in matching: if fname in pair[1]: res = pair[0] break results[res].append(fname) for hash in tuples: print "%s:" % hash['Files'] for value in results[hash['Files']]: print "\tmatches %s" % value
Index: dep5.mdwn =================================================================== --- dep5.mdwn (revision 105) +++ dep5.mdwn (working copy) @@ -144,7 +144,7 @@ Example 1 (tri-licensed files). - Files: src/js/editline/* + Files: ./src/js/editline/* Copyright: 1993, John Doe 1993, Joe Average License: MPL-1.1 or GPL-2 or LGPL-2.1 @@ -161,12 +161,12 @@ Example 2 (recurrent license). - Files: src/js/editline/* + Files: ./src/js/editline/* Copyright: 1993, John Doe 1993, Joe Average License: MPL-1.1 - Files: src/js/fdlibm/* + Files: ./src/js/fdlibm/* Copyright: 1993, J-Random Corporation License: MPL-1.1 @@ -365,7 +365,7 @@ License can be found in the `/usr/share/common-licenses/GPL-2' file. - Files: debian/* + Files: ./debian/* Copyright: 1998, Jane Smith <jsm...@example.net> License: [LICENSE TEXT] @@ -384,7 +384,7 @@ License: PSF-2 [LICENSE TEXT] - Files: debian/* + Files: ./debian/* Copyright: 2008, Dan Developer <d...@debian.example.com> License: Copying and distribution of this package, with or without @@ -392,27 +392,27 @@ provided the copyright notice and this notice are preserved. - Files: debian/patches/theme-diveintomark.patch + Files: ./debian/patches/theme-diveintomark.patch Copyright: 2008, Joe Hacker <h...@example.org> License: GPL-2+ [LICENSE TEXT] - Files: planet/vendor/compat_logging/* + Files: ./planet/vendor/compat_logging/* Copyright: 2002, Mark Smith <msm...@example.org> License: MIT [LICENSE TEXT] - Files: planet/vendor/httplib2/* + Files: ./planet/vendor/httplib2/* Copyright: 2006, John Brown <br...@example.org> License: Unspecified MIT style license. - Files: planet/vendor/feedparser.py + Files: ./planet/vendor/feedparser.py Copyright: 2007, Mike Smith <m...@example.org> License: PSF-2 [LICENSE TEXT] - Files: planet/vendor/htmltmpl.py + Files: ./planet/vendor/htmltmpl.py Copyright: 2004, Thomas Brown <co...@example.org> License: GPL-2+ On Debian systems the full text of the GNU General Public
signature.asc
Description: Digital signature