Hi there! While looking for possible sponsors for DebConf13[1], I wrote a quick&dirty shell script (attached, but read [2]) to extract some information, specifically who sent the request and for which organization.
[1] <http://lists.debian.org/87objissel....@gismo.pca.it> [2] I know that a better solution would have been to understand&reuse the WML infrastructure (these files are already parsed to generate the correct index), but I did not have the time for that, sorry. I thus discovered some discrepancies: - the line containing the contact information is not "standard", i.e. not always "# From: NAME <EMAIL>". Moreover, some names were not completely "standard" either, e.g. lowercase letters or extra quotes[3]. - some files contain HTML-encoded accented characters, while others not, which sounded strange given the README[4] that states: Each file in these directories will create a link from the /users/ page, showing the content of the <pagetitle> tag. BE CAREFUL - the <pagetitle> is added verbatim, that means it MUST NOT contain any 8bit characters (in the english tree) because these titles are put into the translated pages when there is no translation of the file itself and create wrong characters. AGAIN: DO NOT put any 8BIT CHARACTERS into the <pagetitle>. This was even more strange to me since Debian is UTF-8-aware since a while and the migration to UTF-8 for the website was completed [5]. [3] I know this could sound nitpicking, but for automatic parsing (and consistency) I consider it a bug. [4] <http://anonscm.debian.org/viewvc/webwml/webwml/english/users/README?revision=1.4&view=markup> [5] <http://bugs.debian.org/567781> Two examples: --8<---------------cut here---------------start------------->8--- Index: com/alcove.wml =================================================================== RCS file: /cvs/webwml/webwml/english/users/com/alcove.wml,v retrieving revision 1.2 diff -u -r1.2 alcove.wml --- com/alcove.wml 10 Sep 2007 07:38:07 -0000 1.2 +++ com/alcove.wml 19 Nov 2012 20:37:37 -0000 @@ -1,12 +1,12 @@ # From: Yann Dirson <ydir...@fr.alcove.com> -<define-tag pagetitle>Alcôve, France</define-tag> +<define-tag pagetitle>Alcôve, France</define-tag> <define-tag webpage>http://www.alcove.com/</define-tag> #use wml::debian::users <p> - Here at Alcôve, we use Debian for all of our infrastructure and + Here at Alcôve, we use Debian for all of our infrastructure and development workstations, totalling over 30 machines. We also recommend Debian to our customers for most situations, although we also install other distributions if they so desire. Index: edu/unieconomicspoznan.wml =================================================================== RCS file: /cvs/webwml/webwml/english/users/edu/unieconomicspoznan.wml,v retrieving revision 1.2 diff -u -r1.2 unieconomicspoznan.wml --- edu/unieconomicspoznan.wml 26 May 2011 10:05:50 -0000 1.2 +++ edu/unieconomicspoznan.wml 19 Nov 2012 20:37:37 -0000 @@ -1,4 +1,4 @@ -# Maciej So³tysiak <maciej.soltys...@ae.poznan.pl> +# From: Maciej Sołtysiak <maciej.soltys...@ae.poznan.pl> <define-tag pagetitle>University of Economics in Poznan, Poland</define-tag> <define-tag webpage>http://www.ae.poznan.pl/</define-tag> --8<---------------cut here---------------end--------------->8--- Given that I have anyway corrected all the entries for the DebConf sponsors-table, I was wondering if we would like to apply them, which also means that the README[3] file is to be corrected. Obviously, any error generated from such actions would be mine ;-) NB, I have not checked languages other than English nor tried to rebuild the full website. But given that the migration to UTF-8 is completed[4], I would be surprised if the above changes will generate any error. Comments? Thx, bye, Gismo / Luca
#!/bin/sh # # extract-debian-users.sh, extract information from webwml files used # to build www.debian.org/users/ available at # <http://anonscm.debian.org/viewvc/webwml/webwml/english/users/> # Copyright (C) 2012 Luca Capello <l...@pca.it> # Version: # 2012-11-19: 0.1 set -e if [ -z "$1" ]; then echo "Usage: $0 directory [committer]" exit 1 elif [ ! -d "$1" ]; then echo "$1 is not a directory" exit 2 else # remove tralinig '/' DIRECTORY=$(echo "$1" | sed -e 's/\/$//') fi if [ -n "$2" ]; then COMMITTER="$2" else COMMITTER="$USER" fi # description of the output cat <<EOF From <http://www.debian.org/users/$(basename $DIRECTORY)> ====================================== EOF DATE=$(date +%Y-%m-%d) for I in $DIRECTORY/*.wml; do FROM=$(grep "^# From:" "$I") CONTACT=$(echo "$FROM" | sed -e 's/\(.*\)<//' -e 's/>\(.*\)//') PERSON=$(echo "$FROM" | sed -e 's/\(.*\)://' -e 's/<\(.*\)//' -e 's/^ //' -e 's/ $//') TITLE=$(grep "pagetitle>" "$I" | sed -e 's/\(.*\)pagetitle>//' -e 's/<\/define\(.*\)//') WEBSITE=$(grep "webpage>" "$I" | sed -e 's/\(.*\)webpage>//' -e 's/<\/define\(.*\)//') LINK=$(echo "$I" | sed -e 's/\(.*\)users\///' -e 's/\.wml//') cat <<EOF :$TITLE $COMMITTER: $DATE: contact is $CONTACT person is $PERSON website is $WEBSITE source is http://www.debian.org/users/$LINK EOF done
pgprVLbPsORTa.pgp
Description: PGP signature