Some raised a concern with false positives in my reports -- and also tagged all the bugs with etch-ignore. I went through all bug reports manually yesterday (see earlier mail), but I also realized that it would be possible to do this automatically, to provide further assurance that the bugs indicate real and confirmed problems.
I've updated my script to do this, view it last on the page: http://wiki.debian.org/NonFreeIETFDocuments The script will run md5sum on the RFC/I-D in source packages, and compare them against a known-real repository (rsync'ed against ftp.rfc-editor.org). The output of the script is very long, so I won't include it here. An URL to it is: http://josefsson.org/bcp78broken/debian-ietf-documents-diff.txt To parse the output yourself, look for lines beginning with 'pkg'. Those denote the start of a new package with potential problems. After that there will be lines such as 'tar xfz...' and two MD5 sums. If the MD5 sums match, it will print MATCH. If the MD5 sums mismatch, it will print MISMATCH. If it can't find a known-good file to compare with, it prints FETCH-FAIL. Some statistics: 74 packages 401 MATCH, i.e., the RFC in the source package is an authentic RFC 79 MISMATCH, i.e., the RFC differ from the authentic RFC 6 FETCH-FAIL Note that this does _not_ mean that there were 79 false positives in my reports. Nothing I did today indicates that there are any more false positives except (possibly) draft-zebra-00.txt that I found manually yesterday. The FETCH-FAIL's are few and easy to analyze: FETCH-FAIL draft-davis-dasl-protocol-00.txt FETCH-FAIL spf-draft-20040209.txt FETCH-FAIL spf-draft-200405.txt FETCH-FAIL rfc.txt FETCH-FAIL rfc.txt FETCH-FAIL draft-zebra-00.txt I can't find the first document anywhere on the Internet, possibly the filename is incorrect, although it looks like a submitted IETF document. spf-* were submitted through the IETF under other names. rfc.txt is a dummy file. draft-zebra-00.txt was the likely false positive I found manually yesterday. The MISMATCH'es are more interesting to analyze, and indicate a variety of reasons. As can be seen in the file, just a few pages down, one reason is that the RFC in the source package differs from the authenticate RFC! E.g., typos has been corrected. Modifying the document is not permitted by the IETF license, so these files do not seem to be legally distributable at all, not even in non-free. Several files differ trivially, such as removed/added initial/terminal newlines, or changing multiple newlines into one newline. At least one file differ due to RCS $Id$ tags. In the DateTime-Format-Mail archive, the files differ substantially because the source package only contains a small excerpt from the RFC, instead of the entire RFC. Some files differ because I can't compare them to the real document, because the IETF used to put a "RIP-notice" that the document has expired using the same filename. The diff output for all of them suggests that these are real IETF documents, though. /Simon -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]