Hi Branden, On 4/30/23 02:05, G. Branden Robinson wrote: > I should clarify a couple of points here since I was feeling grumpy when > I wrote the following, and that made me forget things. > > At 2023-04-27T09:45:40-0500, G. Branden Robinson wrote: >> We're re-covering some familiar ground here. >> >> I have a few points I'd like to make. >> >> 1. "Semantic newlines" is a terrible term. > > I should have said "_Warn on_ semantic newlines" is a terrible > instruction/summary.
That's why I used the phrase (at least I tried to do it consistently recently) "warn on S. N. violations". > > They are what we _don't_ want to warn about upon encountering them. > > If man-pages(7) or other people continue to call the practice of > breaking *roff input lines after sentence-ending punctuation "semantic > newlines", I have no complaint. It could also be called "Kernighan > breaking", in honor of an early popularizer of the practice. You could use it for the warning name ;). > >> 2. Bjarni's comment '"groff" is not the right tool for such things, >> but "grep" is.' is thoroughly wrong-headed and Ingo was right to >> reject it with great force. Here a few reasons why. I don't >> think any of B through D are relevant to mandoc(1) since it >> doesn't support the features in question (as far as I know). >> >> A. The formatter decides where sentence boundaries are based on >> its input. >> >> B. Use of the `cflags' request can change the characters that >> have sentence-ending semantics. grep(1) cannot know this. >> >> C. Sentence-ending characters are subject to character >> translation (the `tr` request). grep(1) cannot know this. >> >> D. The user/document could define a special character that is a >> sentence-ending character (with `char` and `cflags`). grep(1) >> cannot know this. > > E. Because '.', '?', and '!' are valid characters in *roff > identifiers, grep(1) can be fooled by special character, register, > or string interpolations in the input if their identifiers use > those characters. > > Example: > > I can't believe \*(I. ate the whole thing. > > It is only valid to detect the end of a sentence here if the (recursive) > _expansion_ of the `I.` string ends with a sentence-ending punctuation > character. > > Further, since string interpolations can result in further string > interpolations, a finite-state automaton will not suffice to analyze > this input. You need a stack machine. (IIRC, a stack machine > recognizes "recursively enumerable" languages.) > > This is categorically not what regular expressions can cope with, > formally. Well, formally yes. And a regex can't find C function definitions in a source tree; at least if you try to fool it by writing the most horrible code in the universe. But I wrote a relatively small script[1] that finds a lot of C code with pcre2grep(1), and works most of the time. It has limitations; some of which can be fixed by improving the regexes (read: making them even more unreadable); some others are likely impossible to fix with a regex. The biggest limitation I think I've met is K&R-style functions: I don't think a regex can cope with them. I believe a regex-based script can be good enough for some purposes, even if it's not perfect. Cheers, Alex [1]: <http://www.alejandro-colomar.es/src/alx/alx/grepc.git/tree/bin/grepc> -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5
OpenPGP_signature
Description: OpenPGP digital signature