On Jan 28 18:07:55, tomas.ri...@tuta.com wrote:
> Hi and thank to all of you responding.
> 
> My explanation and understanding:
> 
> myfile.txt example:
> Hello guys
>  <= one space here
> Anybody  from Europe?  <= two spaces between Anybody and from

No, that's not what's in the input.
This is actualy in your input:

48        U+000048  H   LATIN CAPITAL LETTER H
65        U+000065  e   LATIN SMALL LETTER E
6c        U+00006c  l   LATIN SMALL LETTER L
6c        U+00006c  l   LATIN SMALL LETTER L
6f        U+00006f  o   LATIN SMALL LETTER O
20        U+000020      SPACE
67        U+000067  g   LATIN SMALL LETTER G
75        U+000075  u   LATIN SMALL LETTER U
79        U+000079  y   LATIN SMALL LETTER Y
73        U+000073  s   LATIN SMALL LETTER S
0a        U+00000a      LINE FEED (LF)
c2a0      U+0000a0      NON-BREAKING SPACE
0a        U+00000a      LINE FEED (LF)
41        U+000041  A   LATIN CAPITAL LETTER A
6e        U+00006e  n   LATIN SMALL LETTER N
79        U+000079  y   LATIN SMALL LETTER Y
62        U+000062  b   LATIN SMALL LETTER B
6f        U+00006f  o   LATIN SMALL LETTER O
64        U+000064  d   LATIN SMALL LETTER D
79        U+000079  y   LATIN SMALL LETTER Y
c2a0      U+0000a0      NON-BREAKING SPACE
20        U+000020      SPACE
66        U+000066  f   LATIN SMALL LETTER F
72        U+000072  r   LATIN SMALL LETTER R
6f        U+00006f  o   LATIN SMALL LETTER O
6d        U+00006d  m   LATIN SMALL LETTER M
20        U+000020      SPACE
45        U+000045  E   LATIN CAPITAL LETTER E
75        U+000075  u   LATIN SMALL LETTER U
72        U+000072  r   LATIN SMALL LETTER R
6f        U+00006f  o   LATIN SMALL LETTER O
70        U+000070  p   LATIN SMALL LETTER P
65        U+000065  e   LATIN SMALL LETTER E
3f        U+00003f  ?   QUESTION MARK
c2a0      U+0000a0      NON-BREAKING SPACE
0a        U+00000a      LINE FEED (LF)

> tr -c "[:alpha:]" "\n" < myfile.txt
> 
> myfile.txt is INPUT
> tr finds the complement to :alpha: and replaces them by \n,
> resulting in:

> START-OF-FILEmyfile
> txt

Nothing like that is in the input you show.
On the other hand, the "Hello guys" just disappeared?
You are _not_ showing the actual run and the actual result.

If you are new to unix, get familiar with script(1).
Run your example (cat myfile.txt, then the tr commands)
inside script(1) and post the resulting typescript.


> 
> Anybody
> 
> from
> Europe
> 
> EOF

This is what tr(1) does for me:


----------------
Hello
guys



Anybody


from
Europe



----------------

which is

48        U+000048  H   LATIN CAPITAL LETTER H
65        U+000065  e   LATIN SMALL LETTER E
6c        U+00006c  l   LATIN SMALL LETTER L
6c        U+00006c  l   LATIN SMALL LETTER L
6f        U+00006f  o   LATIN SMALL LETTER O
0a        U+00000a      LINE FEED (LF)
67        U+000067  g   LATIN SMALL LETTER G
75        U+000075  u   LATIN SMALL LETTER U
79        U+000079  y   LATIN SMALL LETTER Y
73        U+000073  s   LATIN SMALL LETTER S
0a        U+00000a      LINE FEED (LF)
0a        U+00000a      LINE FEED (LF)
0a        U+00000a      LINE FEED (LF)
0a        U+00000a      LINE FEED (LF)
41        U+000041  A   LATIN CAPITAL LETTER A
6e        U+00006e  n   LATIN SMALL LETTER N
79        U+000079  y   LATIN SMALL LETTER Y
62        U+000062  b   LATIN SMALL LETTER B
6f        U+00006f  o   LATIN SMALL LETTER O
64        U+000064  d   LATIN SMALL LETTER D
79        U+000079  y   LATIN SMALL LETTER Y
0a        U+00000a      LINE FEED (LF)
0a        U+00000a      LINE FEED (LF)
0a        U+00000a      LINE FEED (LF)
66        U+000066  f   LATIN SMALL LETTER F
72        U+000072  r   LATIN SMALL LETTER R
6f        U+00006f  o   LATIN SMALL LETTER O
6d        U+00006d  m   LATIN SMALL LETTER M
0a        U+00000a      LINE FEED (LF)
45        U+000045  E   LATIN CAPITAL LETTER E
75        U+000075  u   LATIN SMALL LETTER U
72        U+000072  r   LATIN SMALL LETTER R
6f        U+00006f  o   LATIN SMALL LETTER O
70        U+000070  p   LATIN SMALL LETTER P
65        U+000065  e   LATIN SMALL LETTER E
0a        U+00000a      LINE FEED (LF)
0a        U+00000a      LINE FEED (LF)
0a        U+00000a      LINE FEED (LF)
0a        U+00000a      LINE FEED (LF)

You are not showing the actual result.

> Should I have included -s, it would have removed multiple occurences
> of \n from the OUTPUT and the result would have been:
> 
> myfile
> txt
> Anybody
> from
> Europe
> EOF

The result of 

        tr -cs "[:alpha:]" "\n" < myfile.txt

on your actual input is

------------
Hello
guys
Anybody
from
Europe
------------


> As the tr(1) states: This (the squeeze) occurs AFTER all deletion and
> translation is completed.

Yes: tr(1) replaces all the non-[:alpha:]s with a newline,
and then squeezes the multiple consecutive occurences of newlines
(such as the four newlines after "guys") into one newline.

Here it is once again,
with pipes instead of newlines for readability:

$ tr -c "[:alpha:]" "|" < myfile.txt  
Hello|guys||||Anybody|||from|Europe||||
$ tr -cs "[:alpha:]" "|" < myfile.txt
Hello|guys|Anybody|from|Europe|

> So I still believe there should be OUTPUT in the -s description.

Being a non-native speaker myself, I can sympathize with
fighting the ambiguity. Is the following wording better?

It is still not _absolutely_ clear, as it talks about
"_the_ character" - but which one, if string2 is longer then one?

$ tr -c  "[:alpha:]" "XY" < myfile.txt
$ tr -cs "[:alpha:]" "XY" < myfile.txt

        Jan



--- tr.1.orig   Tue Jan 28 19:08:31 2025
+++ tr.1        Tue Jan 28 19:12:08 2025
@@ -80,15 +80,15 @@ The
 .Fl d
 option causes characters to be deleted from the input.
 .It Fl s
-The
+After all deletion and translation is completed,
+the
 .Fl s
-option squeezes multiple occurrences of the characters listed in the last
-operand (either
+option squeezes multiple consecutive occurrences of the characters
+listed in the last operand (either
 .Ar string1
 or
 .Ar string2 )
-in the input into a single instance of the character.
-This occurs after all deletion and translation is completed.
+into a single instance of the character.
 .El
 .Pp
 In the first synopsis form, the characters in

Reply via email to