On Sun, 27 Jan 2019 12:38:39 -0500 "Mark E. Shoulson via Unicode" <unicode@unicode.org> wrote:
> On 1/27/19 11:08 AM, Michael Everson via Unicode wrote: > > It is a letter. In “can’t” the apostrophe isn’t a letter. It’s a > > mark of elision. I can double-click on the three words in this > > paragraph which have the apostrophe in them, and they are all > > whole-word selected. > > That doesn't work when I try it: I double-click on the "a" in "can’t" > and get only the "can" selected. > > This does not necessarily prove anything; my software (Thunderbird) > is arguably doing it wrong. Except the Uniocde-compliant processes aren't required to follow the scheme of TR27 Unicode Text Segmentation. However, it is only required to select the whole word because the U+2019 is followed by a letter. TR27 prescribes different behaviour for "dogs'" with U+2019 (interpret as two 'words') and U+02BC (interpret as one word). The GTK-based email client I'm using has that difference, but also fails with "don't" unless one uses U+02BC. However LibreOffice treats "don't" as a single word for U+0027, U+02BC and U+2019, but "dogs'" as a single word only for U+02BC. This complies with TR27. I'm not surprised, as LibreOffice does use or has used ICU. Richard.