Russ Allbery wrote:
This seems to imply that you no longer have a file named rfc3454.txt? You
want to strip all the text out of that file except for the table, but
leave the table in the tree still named rfc3454.txt.
This would imply understanding what needs to be extracted. For rfc3454.txt,
it appears that the tables are all that required; presumably this means
going through
the tables manually and deleting and the many page headers that appear
within,
and hoping I haven't accidentally deleted a table row.
Unfortunately, rfc3492.txt looks more hairy, at quick glance it looks
like the code extracts all of section 7.1 (open filename not hard coded
in source):
=== cut ===
f = open("rfc3492.txt", 'r')
examples_h = generate.Header('punycode_examples.h')
examples_c = generate.Header('punycode_examples.c')
start = False
while True:
l = f.readline()
if not l:
break
if l[-2:] == "\\\n":
l2 = f.readline()
if not l2:
raise Exception("EOF in backslash escape")
l2 = re.sub('^ *', '', l2)
l = l[:-2] + l2
if start:
if re.match('7\.2', l):
start = False
else:
m = re.search('^ *\([A-Z]\) *(.*)$', l);
if m:
desc = m.group(1)
codes = []
else:
m = re.search('^ *([uU]+.*) *$', l)
if m:
codes.extend(string.split(m.group(1), ' '))
else:
m = re.search('^ *Punycode: (.*) *$', l)
if m:
cases.append([codes, m.group(1), desc])
else:
if re.match('^7\.1', l):
start = True
cases = []
f.close()
=== cut ===
=== cut ===
7.1 Sample strings
In the Punycode encodings below, the ACE prefix is not shown.
Backslashes show where line breaks have been inserted in strings too
long for one line.
The first several examples are all translations of the sentence "Why
can't they just speak in <language>?" (courtesy of Michael Kaplan's
"provincial" page [PROVINCIAL]). Word breaks and punctuation have
been removed, as is often done in domain names.
(A) Arabic (Egyptian):
u+0644 u+064A u+0647 u+0645 u+0627 u+0628 u+062A u+0643 u+0644
u+0645 u+0648 u+0634 u+0639 u+0631 u+0628 u+064A u+061F
Punycode: egbpdaj6bu4bxfgehfvwxn
(B) Chinese (simplified):
u+4ED6 u+4EEC u+4E3A u+4EC0 u+4E48 u+4E0D u+8BF4 u+4E2D u+6587
Punycode: ihqwcrb4cv8a8dqg056pqjye
[...]
=== cut ===
Can I use this data???
(Apologies if I'm missing some additional subtlety. I haven't had a
chance to fully analyze what the Heimdal build system is doing.)
I tried to upload what I have so far to Debian experimental so that
others can look at it (I hate being the sole maintainer of such a
complicated package). I think this is within the scope of experimental,
for anything that is experimental and might be broken. In this case it
breaks policy by not being DFSG compliant. Unfortunately it got rejected
with the message I should use non-free instead.
I am wondering if the ftp-masters missed the point that it is an
existing package already in main and should not get moved to non-free.
Unfortunately this was an issue because one of the sonames for one of
the shared libraries was also incremented, resulting in the package
being marked as new.
Brian May
--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]