On 5/8/22 23:38, Benson Muite wrote:
When usinggrep -E "\s[a-z\`\'āáàēéèīíìịị̄ị́ị̀ōóòọọ̄ọọ́ọ̀ūúùụ̄ụ́ụ̀n̄ńǹm̄ḿm̀]{4}$" to extract 4 letter Igbo words
The {4} means "4 characters", not "4 letters", and a combining character counts as a character.
It might be nice for 'grep' to have ways to perform Unicode normalization before matching. In the meantime perhaps you can get what you want by normalizing the text before running it through 'grep'.