Thomas Munro wrote:

> Looking around a bit, it might be interesting to check if the
> icu_character_boundaries() function in Daniel Vérité's icu_ext treats
> IVSs as single grapheme clusters.

It does.

with strings(s) as (
 values (U&'\+0066FE' || U&'\+0E0103'),
        (U&'\+00304B' || U&'\+00309A')
)
select s,
  octet_length(s),
  char_length(s),
  (select count(*) from icu_character_boundaries(s,'en')) as graphemes
from strings;


  s  | octet_length | char_length | graphemes 
-----+--------------+-------------+-----------
 曾󠄃 |           7 |           2 |         1
 か゚  |           6 |           2 |         1



Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite


Reply via email to