On Thu, 23 Apr 2009, Manoj Srivastava wrote:
While I can't speak for the policy team (I have not been re-delegated yet), I suspect the answer might be to get a working implementation out in the wild (it does not have to be packages.d.o or anything official -- even a standalone software that takes the output from grep-dctrl or parses a Packages file will suffice). This will allow us to see what changes to policy might be needed, if any, for package descriptions.
Would you consider the tasks pages I announced yesterday [1] as such an implementation. I continued to work a bit on this and have two additions to the preprocessor: 1. The inlist flag has to be unset not only if a line starts in the second column again but also if there is an empty line. 2. You need to escape '#' signs if they appear as first character. See the implementation at the end of this mail.
Once we ahve a working implementation, and a clear idea of what might need to be changed in package descriptions (for example, we already know that packages using 'o' as a bullet in unordered lists will have to be changed to use one of +.-. or *), we can scan the package descriptions to see how many packages would be affected, and then decide how to introduce that language into policy (more package affected, the more the need for a transition plan)
I tried to detect some examples which need some changes. You might like to have a look at my "Debugging Blend": http://blends.debian.net/debug/tasks Some issues are mentioned there - I intend to add some better documentation if needed but some issues become clear.
I do not see any reason this proposal should not become policy, eventually, since this deals with the core charter of the technical policy: standards that packages need to follow to allow for better integration.
After dealing with the issue I would do the following resume: 1. The preprocessing you have to do for markdown is basically the same I did for turning description text into html programmatically myself. There is no real benefit if your main target is only HTML - however, other output formats might benefit from using the preprocessing + markup step. 2. Markdown is probably better in detecting second level lists thank I would have done it programmatically - so here is a benefit. On the other hand there are some strange false positives for second level lists. 3. If we really are doing preprocessing it would be cheap to use 's/\so\s/ * /' and even this marker might be detected as list marker. This would be perfectly in contrast to my initial suggestion - but consequent if you prefer preprocessing anyway. BTW, I even detected non-ASCII bullets in the burn package and because it is QA maintained anyway I took the chance to change this while fixing bug #517793. I think we should catch things like this quite quickly because even apt-cache show failed to disply the description of burn correctly and so I've though fixing the problem myself instead of adding another bug to a QA maintained package seems reasonable. 4. I expect more not yet detected needs for preprocessing. 5. I expect the lintian checks for the markdown format rather complicated because there is a lot more freedom in the format (which might be an advantage for the editors) and some valid markdown input might be successfully rendered but into something which conflicts the intention of the author. Compared to my suggestion of formating the long descriptions according to stricter rules this adds another level of complecity while the lintien checks which would be needed for my suggestions would have been really cheap. I'd consider this as a disadvantage. I might note that I'm not happy that in the case of pure and simple ASCII output of long descriptions as it is done by current tools more or less we will have a rendering which does not fit my taste at all - but I accept that I probably belong to a minority and if markdown is widely accepted it leads to my initial goal (tasks pages) as well. Kind regards Andreas. [1] http://lists.debian.org/debian-devel/2009/04/msg00815.html Python implementation: detect_list_start_re = re.compile("^\s+[-*+]\s+") detect_code_start_re = re.compile("^\s") detect_code_end_re = re.compile("^[^\s]") detect_url_re = re.compile("[fh]t?tp://") def PrepareMarkdownInput(lines): ret = '' inlist = 0 incode = 0 for line in lines: # strip leading space from description as well as useless trailing line = re.sub('^ ', '', line.rstrip()) # a '^\.$' marks in descriptions a new paragraph, markdown uses an empty line here line = re.sub('^\.$', '', line) if detect_code_start_re.search(line): if incode == 0: # If a list or verbatim mode starts MarkDown needs an empty line ret += "\n" incode = 1 if detect_list_start_re.search(line): inlist = 1 if incode == 1 and inlist == 0: ret += "\t" # Add a leading tab if in verbatim but not in list mode # If there is an empty line or a not indented line the list or verbatim text ends # It is important to check for empty lines because some descriptions would insert # more lines than needed in verbose mode (see for instance glam2) if ( detect_code_end_re.search(line) or line == '' ) and incode == 1: inlist = 0 # list ends if indentation stops incode = 0 # verbatim mode ends if indentation stops # Mask # at first character in line which would lead to # MARKDOWN-CRITICAL: "We've got a problem header!" # otherwise if line.startswith('#'): ret += '\\' if detect_url_re.search(line): # some descriptions put URLs in '<>' which is unneeded and might # confuse the parsing of '&' in URLs which is needed sometimes line = re.sub('<*([fh]t?tp://[-./\w?=~;&]+)>*', '[\\1](\\1)', line) ret += line + "\n" return ret -- http://fam-tille.de -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org