On 5/8/22 23:38, Benson Muite wrote:
When using

grep -E 
"\s[a-z\`\'āáàēéèīíìịị̄ị́ị̀ōóòọọ̄ọọ́ọ̀ūúùụ̄ụ́ụ̀n̄ńǹm̄ḿm̀]{4}$"

to extract 4 letter Igbo words

The {4} means "4 characters", not "4 letters", and a combining character counts as a character.

It might be nice for 'grep' to have ways to perform Unicode normalization before matching. In the meantime perhaps you can get what you want by normalizing the text before running it through 'grep'.



Reply via email to