Re: Encoding non-alphanumeric characters in manpage filenames

2021-11-30 Thread Matthew Mondor
On Fri, 5 Nov 2021 23:50:38 +0200 Lassi Kortela wrote: > * URL encoding (using a "%" character before the digits). I'm not sure if this was a possible consideration or if it would work for your purposes, but apropos/-k has a database created using makemandb(8) and it would theoretically be possi

Re: Encoding non-alphanumeric characters in manpage filenames

2021-11-11 Thread Kimmo Suominen
On Thu, Nov 11, 2021 at 04:22:33PM +0200, Lassi Kortela wrote: > In the present case, it's prudent to establish a clear rule by which > the filenames in man/cat directories can be taken apart to find the > page, section, and other suffixes. I don't think I've seen a use-case -- in this thread or o

Re: Encoding non-alphanumeric characters in manpage filenames

2021-11-11 Thread Mouse
> If all dots were escaped, these layers would be kept separate. Then > manpage filenames would reliably be of the form: > page "." section-extension ( "." other-extension )* > Such a filename can be correctly split at dots without knowing what > manual sections, compression tools, and othe

Re: Encoding non-alphanumeric characters in manpage filenames

2021-11-09 Thread Kimmo Suominen
On Tue, Nov 09, 2021 at 03:10:44PM +0200, Lassi Kortela wrote: > Escaping "." in the stem part is good practice when the name of a manpage > contains a dot. It would be annoying to have to rename existing manual pages. > man -w resolv.conf /usr/share/man/man5/resolv.conf.5 Kind regards,

Re: Encoding non-alphanumeric characters in manpage filenames

2021-11-09 Thread Mouse
> The suffix is hopefully restricted to [0-9a-z.] in all cases, and > hence doesn't need to be escaped. "hopefully" is not a good basis for designing something like this. Or at least that's my opinion. Since you've said this is for man(1)'s purposes, surely you can find out what suffixes are pos

Re: Encoding non-alphanumeric characters in manpage filenames

2021-11-09 Thread Robert Elz
Date:Mon, 8 Nov 2021 18:14:23 -0500 (EST) From:Mouse Message-ID: <202111082314.saa13...@stone.rodents-montreal.org> | > is posix speak for '/' | | But is that "Unicode codepoint 47" or "ASCII codepoint 0x2f" or | "whatever the character set in use provides th

Re: Encoding non-alphanumeric characters in manpage filenames

2021-11-08 Thread Mouse
> 3.243 Pathname > [...] [...] > is posix speak for '/' But is that "Unicode codepoint 47" or "ASCII codepoint 0x2f" or "whatever the character set in use provides that is a line between upper right and lower left" or what? Does POSIX mandate an ASCII superset, for example? C99 demands that

Re: Encoding non-alphanumeric characters in manpage filenames

2021-11-08 Thread Robert Elz
Date:Mon, 8 Nov 2021 13:47:09 -0500 (EST) From:Mouse Message-ID: <202111081847.naa28...@stone.rodents-montreal.org> | What does POSIX say? >From XBD (basic definitions) 3.243 Pathname A string that is used to identify a file. In the context of P

Re: Encoding non-alphanumeric characters in manpage filenames

2021-11-08 Thread Reinoud Zandijk
On Mon, Nov 08, 2021 at 03:30:14PM -0500, Mouse wrote: > >> What does POSIX say? > > [...] > > 2. Each byte in the UTF-8 encoding is interpreted as ASCII > > As soon as any of the input codepoints are non-ASCII, UTF-8 generates > octets which are ouside the ASCII range and thus cannot be interpret

Re: Encoding non-alphanumeric characters in manpage filenames

2021-11-08 Thread Mouse
>> What does POSIX say? > [...] > 2. Each byte in the UTF-8 encoding is interpreted as ASCII As soon as any of the input codepoints are non-ASCII, UTF-8 generates octets which are ouside the ASCII range and thus cannot be interpreted as ASCII (at least not without further processing). > 3. If the

Re: Encoding non-alphanumeric characters in manpage filenames

2021-11-08 Thread Mouse
> While most ASCII punctuation characters are legal in Unix filenames, I actually would warn against some thinking that could be (not "is") present here. UNIX filenames are not character strings. They are octet strings, which may be - often are - interpreted as encoding character strings. Two oc