Le Thu, Dec 10, 2009 at 01:56:20AM +0000, Dmitrijs Ledkovs a écrit : > > There isn't DEB-5 debian/copyright parser available. So this cannot be > implemented in licensecheck yet.
Dear Dmitrijs, Jon Dowland has published an example parser on this list (http://lists.debian.org/msgid-search/20090913225846.gb16...@tchicaya.lan). However, it is written in Python and is therefore of a little help for licensecheck, written in Perl. On my side, I have started to work on a parser for the relaxed syntax I propose on my exprimental git branch of the DEP (http://git.debian.org/?p=users/plessy/license-summary.git;a=blob_plain;f=dep5.mdwn). In that case, it is as simple as: - Process paragraphs – separated by an empty line – one by one. - Collapse paragraphs in a hash where keys are field names, ignoring paragraphs that do not contain fields. This results in an array of hashes, or in YAML dialect, a sequence of mappings. $/ = undef; my @paragraphs = split (/\n\n/, <>); # Split on empty lines my @parsed; my $counter = 0; foreach my $paragraph (@paragraphs) { if (my $collapsed = collapse($paragraph)) { # Collapse each paragraph in a hash $parsed[$counter++] = $collapsed; } } sub collapse { my $paragraph = shift; my %hash; my $current_field = 0; # Next line may still be part of the field content. my @lines = split (/\n/, $paragraph); foreach (@lines) { if ( /^(\w+)\s*:\s*(.*)$/ ) { # New fields terminate the previous one. $current_field = $1; $hash{$1} .= "$2"; } elsif ( /^\s(.*)$/ ) { $hash{$current_field} .= "\n$1" if $current_field; } else { $current_field = 0; # Lack of indentation also terminate the field. } } return \%hash if keys(%hash); } The above script still has bugs, but I hope it summarises how easy it could be to write a parser if the DEP is constructed with this as a goal. I originally proposed a syntax that is not the same as Debian control files, but currently I am still dissatisfied even by my proposition. With whichever format, it is easy to break the syntax, in particular by forgetting white space for indentation, or the ‘space-dot’ escape sequence for the empty lines in the ‘Debian control’ syntax. From my frustrating experience when adding by hand the contents of the artistic v2.0 license to the debian/copyright file from one of the packages I maintain, I concluded that it can significantly impair the adoption of DEP-5. So on this list or elsewhere, I think that there is still some experimentation and concertation to do. Have a nice day, -- Charles Plessy Tsurumi, Kanagawa, Japan -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org