On Sun, Nov 3, 2024 at 1:49 AM Mark Geisert via Cygwin <cygwin@cygwin.com> wrote: > > Continuing my monologue, with due consideration of comments posted, ... > > On 10/23/2024 10:01 PM, Mark Geisert via Cygwin wrote: > > Replying to myself, I continue... > > > > On 10/22/2024 10:33 PM, Mark Geisert via Cygwin wrote: > >> On 10/22/2024 8:00 PM, Backwoods BC via Cygwin wrote: > >>> It appears that 'rev' is choking on any character \x80 or higher, but > >>> is OK with those \x1f or smaller. It doesn't give an error or ignore > >>> it, it just stops. > >>> > >>> I don't have access to a Linux box so I can't see if this happens > >>> there and nothing in the documentation suggests that this is the > >>> correct functionality. > >>> > >>> Test case: > >>> printf 'no non-ASCII characters\nhex 01 >\x01< here\nhex 80 >\x80< > >>> here\nLine 4\n'|rev|rev > >>> > >>> This is for "rev from util-linux 2.33.1" > >>> > >>> I don't have the current version of 'rev' on my system due to not > >>> having updated in a while. I accidentally screwed up my installation > >>> and have been reluctant to wipe it and start over. > >>> > >>> So, is this the expected behaviour for the current version of 'rev' > >>> under Cygwin and/or Linux? > >> > >> The current Cygwin util-linux 2.39.3-2 rev behaves in the same, broken > >> way. It looks like line-ending char(s) are not being handled > >> correctly. Don't know yet if it's rev itself or fgetws() being used > >> by rev that's busted. I'll investigate further. Thanks for the report! > > > > This is a locale issue. In the default Cygwin locale, rev mishandles > > the \x80 byte and instead of stopping with an error message it enters an > > infinite loop. I'll probably report this upstream instead of working > > out a local fix. > > Upstream util-linux 2.40.2 has an updated 'rev' that stops with an error > message when the OP's testcase is tried. I'm testing the full 2.40.2 > for Cygwin release before too long. > > > There is a work-around: change to the "C" locale just to run rev. > > LC_ALL=C rev zzz > > where zzz is a file containing your four lines. You can also run your > > original testcase with "rev" replaced by "LC_ALL=C rev" in both places. > > Implicit in that suggestion is that the OP seemed to be uninterested in > any form of multi-byte characters.. just straightforward operation on > bytes, even if they have the high bit set. > > That said, I appreciate the follow-up comments that dealt with the > general problem. > Thanks all, > > ..mark
Sorry for dropping out of the thread. I lost interest in pursuing the issue once I learned that 'rev' would balk at any character it didn't like instead of just passing it through, and found a workaround for my case. What I really wanted is something that would do a byte-by-byte reversal working backwards from a LF character. My use for 'rev' is to allow sorting based on field position from the *end* of the line. 'sort' won't do this itself, as far as I can tell. My method follows: printf -v mySep '\xff' cat fileOfFullPathNames | rev | sed -r -e "s/\./$mySep/" | rev | sort -t "$mySep" --key=2.1 | tr "$mySep" '.' This particular pipe is to sort fileOfFullPathNames by file extension. As mentioned, this stops abruptly when it encounters my inserted field separator of \xff. I found that it would do what I wanted if I used \x1f as mySep instead. To be honest, in far too many years of using *nix as a user (not a developer), doing this kind of thing is the only use I've ever had for 'rev'. I probably used a different separator before (likely \x09) which is why I haven't encountered an issue. What I appear to really need is "rev --binary" that just reverses everything regardless of what it is until it finds a LF. I may get motivated to write it for myself if I run into situations where I can't work around the restrictions in 'rev'. -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple