On Sun, Nov 3, 2024 at 1:49 AM Mark Geisert via Cygwin
<cygwin@cygwin.com> wrote:
>
> Continuing my monologue, with due consideration of comments posted, ...
>
> On 10/23/2024 10:01 PM, Mark Geisert via Cygwin wrote:
> > Replying to myself, I continue...
> >
> > On 10/22/2024 10:33 PM, Mark Geisert via Cygwin wrote:
> >> On 10/22/2024 8:00 PM, Backwoods BC via Cygwin wrote:
> >>> It appears that 'rev' is choking on any character \x80 or higher, but
> >>> is OK with those \x1f or smaller. It doesn't give an error or ignore
> >>> it, it just stops.
> >>>
> >>> I don't have access to a Linux box so I can't see if this happens
> >>> there and nothing in the documentation suggests that this is the
> >>> correct functionality.
> >>>
> >>> Test case:
> >>> printf 'no non-ASCII characters\nhex 01 >\x01< here\nhex 80 >\x80<
> >>> here\nLine 4\n'|rev|rev
> >>>
> >>> This is for "rev from util-linux 2.33.1"
> >>>
> >>> I don't have the current version of 'rev' on my system due to not
> >>> having updated in a while. I accidentally screwed up my installation
> >>> and have been reluctant to wipe it and start over.
> >>>
> >>> So, is this the expected behaviour for the current version of 'rev'
> >>> under Cygwin and/or Linux?
> >>
> >> The current Cygwin util-linux 2.39.3-2 rev behaves in the same, broken
> >> way.  It looks like line-ending char(s) are not being handled
> >> correctly.   Don't know yet if it's rev itself or fgetws() being used
> >> by rev that's busted.  I'll investigate further.  Thanks for the report!
> >
> > This is a locale issue.  In the default Cygwin locale, rev mishandles
> > the \x80 byte and instead of stopping with an error message it enters an
> > infinite loop.  I'll probably report this upstream instead of working
> > out a local fix.
>
> Upstream util-linux 2.40.2 has an updated 'rev' that stops with an error
> message when the OP's testcase is tried.  I'm testing the full 2.40.2
> for Cygwin release before too long.
>
> > There is a work-around: change to the "C" locale just to run rev.
> >      LC_ALL=C rev zzz
> > where zzz is a file containing your four lines.  You can also run your
> > original testcase with "rev" replaced by "LC_ALL=C rev" in both places.
>
> Implicit in that suggestion is that the OP seemed to be uninterested in
> any form of multi-byte characters.. just straightforward operation on
> bytes, even if they have the high bit set.
>
> That said, I appreciate the follow-up comments that dealt with the
> general problem.
> Thanks all,
>
> ..mark

Sorry for dropping out of the thread. I lost interest in pursuing the
issue once I learned that 'rev' would balk at any character it didn't
like instead of just passing it through, and found a workaround for my
case. What I really wanted is something that would do a byte-by-byte
reversal working backwards from a LF character.

My use for 'rev' is to allow sorting based on field position from the
*end* of the line. 'sort' won't do this itself, as far as I can tell.
My method follows:
printf -v mySep '\xff'
cat fileOfFullPathNames | rev | sed -r -e "s/\./$mySep/" | rev | sort
-t "$mySep" --key=2.1 | tr "$mySep" '.'

This particular pipe is to sort fileOfFullPathNames by file extension.
As mentioned, this stops abruptly when it encounters my inserted field
separator of \xff. I found that it would do what I wanted if I used
\x1f as mySep instead.

To be honest, in far too many years of using *nix as a user (not a
developer), doing this kind of thing is the only use I've ever had for
'rev'. I probably used a different separator before (likely \x09)
which is why I haven't encountered an issue.

What I appear to really need is "rev --binary" that just reverses
everything regardless of what it is until it finds a LF. I may get
motivated to write it for myself if I run into situations where I
can't work around the restrictions in 'rev'.

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to