On 2024-10-23 23:01, Mark Geisert via Cygwin wrote:
On 10/22/2024 10:33 PM, Mark Geisert via Cygwin wrote:
On 10/22/2024 8:00 PM, Backwoods BC via Cygwin wrote:
It appears that 'rev' is choking on any character \x80 or higher, but
is OK with those \x1f or smaller. It doesn't give an error or ignore
it, it just stops.
I don't have access to a Linux box so I can't see if this happens
there and nothing in the documentation suggests that this is the
correct functionality.
Test case:
printf 'no non-ASCII characters\nhex 01 >\x01< here\nhex 80 >\x80<
here\nLine 4\n'|rev|rev
This is for "rev from util-linux 2.33.1"
I don't have the current version of 'rev' on my system due to not
having updated in a while. I accidentally screwed up my installation
and have been reluctant to wipe it and start over.
So, is this the expected behaviour for the current version of 'rev'
under Cygwin and/or Linux?
The current Cygwin util-linux 2.39.3-2 rev behaves in the same, broken way.
It looks like line-ending char(s) are not being handled correctly. Don't
know yet if it's rev itself or fgetws() being used by rev that's busted. I'll
investigate further. Thanks for the report!
This is a locale issue. In the default Cygwin locale, rev mishandles the \x80
byte and instead of stopping with an error message it enters an infinite loop.
I'll probably report this upstream instead of working out a local fix.
There is a work-around: change to the "C" locale just to run rev.
LC_ALL=C rev zzz
where zzz is a file containing your four lines. You can also run your original
testcase with "rev" replaced by "LC_ALL=C rev" in both places.
I run with a UTF-8 locale and have not noticed any issues as I use UTF-8 files.
The man page for rev(1) says it works on wide characters, and `cygcheck rev`
shows it is built with gettext-devel libintl/libiconv.
I could see an issue if the shell and file locales mismatch, or possibly if the
file contains SMP aka non-BMP characters as UTF-16 surrogates.
The correct approach should be to match the execution locale to the file locale,
for example, `LC_ALL=...UTF-8 rev ...` which should produce the expected results.
--
Take care. Thanks, Brian Inglis Calgary, Alberta, Canada
La perfection est atteinte Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer but when there is no more to cut
-- Antoine de Saint-Exupéry
--
Problem reports: https://cygwin.com/problems.html
FAQ: https://cygwin.com/faq/
Documentation: https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple