Am 04.11.2024 um 05:56 schrieb Backwoods BC via Cygwin:
On Sun, Nov 3, 2024 at 1:49 AM Mark Geisert via Cygwin
<cygwin@cygwin.com> wrote:
Continuing my monologue, with due consideration of comments posted, ...
On 10/23/2024 10:01 PM, Mark Geisert via Cygwin wrote:
Replying to myself, I continue...
On 10/22/2024 10:33 PM, Mark Geisert via Cygwin wrote:
On 10/22/2024 8:00 PM, Backwoods BC via Cygwin wrote:
It appears that 'rev' is choking on any character \x80 or higher, but
is OK with those \x1f or smaller. It doesn't give an error or ignore
it, it just stops.
I don't have access to a Linux box so I can't see if this happens
there and nothing in the documentation suggests that this is the
correct functionality.
Test case:
printf 'no non-ASCII characters\nhex 01 >\x01< here\nhex 80 >\x80<
here\nLine 4\n'|rev|rev
This is for "rev from util-linux 2.33.1"
I don't have the current version of 'rev' on my system due to not
having updated in a while. I accidentally screwed up my installation
and have been reluctant to wipe it and start over.
So, is this the expected behaviour for the current version of 'rev'
under Cygwin and/or Linux?
The current Cygwin util-linux 2.39.3-2 rev behaves in the same, broken
way. It looks like line-ending char(s) are not being handled
correctly. Don't know yet if it's rev itself or fgetws() being used
by rev that's busted. I'll investigate further. Thanks for the report!
This is a locale issue. In the default Cygwin locale, rev mishandles
the \x80 byte and instead of stopping with an error message it enters an
infinite loop. I'll probably report this upstream instead of working
out a local fix.
Upstream util-linux 2.40.2 has an updated 'rev' that stops with an error
message when the OP's testcase is tried. I'm testing the full 2.40.2
for Cygwin release before too long.
There is a work-around: change to the "C" locale just to run rev.
LC_ALL=C rev zzz
where zzz is a file containing your four lines. You can also run your
original testcase with "rev" replaced by "LC_ALL=C rev" in both places.
Implicit in that suggestion is that the OP seemed to be uninterested in
any form of multi-byte characters.. just straightforward operation on
bytes, even if they have the high bit set.
That said, I appreciate the follow-up comments that dealt with the
general problem.
Thanks all,
..mark
Sorry for dropping out of the thread. I lost interest in pursuing the
issue once I learned that 'rev' would balk at any character it didn't
like instead of just passing it through, and found a workaround for my
case. What I really wanted is something that would do a byte-by-byte
reversal working backwards from a LF character.
My use for 'rev' is to allow sorting based on field position from the
*end* of the line. 'sort' won't do this itself, as far as I can tell.
My method follows:
printf -v mySep '\xff'
cat fileOfFullPathNames | rev | sed -r -e "s/\./$mySep/" | rev | sort
-t "$mySep" --key=2.1 | tr "$mySep" '.'
This particular pipe is to sort fileOfFullPathNames by file extension.
As mentioned, this stops abruptly when it encounters my inserted field
separator of \xff. I found that it would do what I wanted if I used
\x1f as mySep instead.
To be honest, in far too many years of using *nix as a user (not a
developer), doing this kind of thing is the only use I've ever had for
'rev'. I probably used a different separator before (likely \x09)
which is why I haven't encountered an issue.
What I appear to really need is "rev --binary" that just reverses
everything regardless of what it is until it finds a LF. I may get
motivated to write it for myself if I run into situations where I
can't work around the restrictions in 'rev'.
As noted before in this thread, "rev --binary" is "LC_ALL=C rev".
--
Problem reports: https://cygwin.com/problems.html
FAQ: https://cygwin.com/faq/
Documentation: https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple