Re: case-insensitive grep with accented letters

Niels Müller Larsen Sat, 31 May 2025 04:18:40 -0700

A shortcut to do it the hard way might be creating a
export SPECCHARS=<your special chars>
In your .profile or .kshrc, and the use that whenever
/Niels


Sent from [Proton Mail](https://proton.me/mail/home) for iOS

On Sat, May 31, 2025 at 12:47, Stuart Henderson 
<[stu.li...@spacehopper.org](mailto:On Sat, May 31, 2025 at 12:47, Stuart 
Henderson <<a href=)> wrote:

> On 2025-05-31, rsyk...@disroot.org <rsyk...@disroot.org> wrote:
>> Dear list,
>>
>>
>> I was surprised to learn that 'grep -i' does not
>> really work for accented letters
>>
>> odin:~$ cat a
>> křížala
>> kŘíŽala
>> odin:~$ grep -i ž a
>> křížala
>> odin:~$ grep -i Ž a
>> kŘíŽala
>>
>> As I had LC_COLLATE="C", I tried also with this
>> set to en_US.UTF-8, but to no avail.
>>
>> Does grep -i only work for ascii letters?
>
> yes, that's expected.
>
> OpenBSD base doesn't support LC_COLLATE.
>
> $ man -k ANY=LC_COLLATE
> locale(1) - character encoding and localization conventions
> glob, globfree(3) - generate pathnames matching a pattern
> setlocale(3) - select character encoding
> strcoll, strcoll_l(3) - compare strings according to current collation
> strxfrm, strxfrm_l(3) - transform a string under locale
> wcscoll, wcscoll_l(3) - compare wide strings according to the current 
> collation
> wcsxfrm, wcsxfrm_l(3) - transform a wide string under locale
> $ man locale
> LOCALE(1) General Commands Manual LOCALE(1)
>
> NAME
> locale – character encoding and localization conventions
>
> SYNOPSIS
> locale [-a | -m | charmap]
> [...]
>
> A locale is a set of environment variables telling programs which
> character encoding, language and cultural conventions the user
> prefers. Programs in the OpenBSD base system ignore the locale except
> for the character encoding, and it is not recommended to use any of
> these variables except that the following non-default setting is
> supported as an option:
>
> export LC_CTYPE=en_US.UTF-8
>
> Programs installed from packages(7) may or may not change behavior
> according to the locale. Many programs use the X/Open System
> Interfaces naming scheme for the contents of the variables listed
> below, which is language[_TERRITORY][.encoding][@modifier]
> [...]
>
>> Is there a general way to achive 'true' case
>> insensitive match (other than list all possibly
>> present accented letters in both forms, i.e.,
>> as [žŽ] in my case?
>
> ggrep does in this instance, but I don't know how reliable that is.
>
> --
> Please keep replies on the mailing list.

Re: case-insensitive grep with accented letters

Reply via email to