Public bug reported:

Despite their presence in the `.dic` file, `hunspell` breaks some contractions 
at ' (ASCII apostrophe) or ’ (Unicode apostrophe) and rejects resulting 
non-words as misspellings.
```ShellSession
luism@lmm-notebook:~$ lsb_release -rd
Description:    Ubuntu 18.10
Release:        18.10
luism@lmm-notebook:~$ apt list --installed hunspell hunspell-en-us
Listing... Done
hunspell-en-us/cosmic,now 1:2018.04.16-1 all [installed]
hunspell/cosmic,now 1.6.2-1build1 amd64 [installed]
luism@lmm-notebook:~$ hunspell -D
SEARCH PATH:
.::/usr/share/hunspell:/usr/share/myspell:/usr/share/myspell/dicts:/Library/Spelling:/home/luism/.openoffice.org/3/user/wordbook:/home/luism/.openoffice.org2/user/wordbook:/home/luism/.openoffice.org2.0/user/wordbook:/home/luism/Library/Spelling:/opt/openoffice.org/basis3.0/share/dict/ooo:/usr/lib/openoffice.org/basis3.0/share/dict/ooo:/opt/openoffice.org2.4/share/dict/ooo:/usr/lib/openoffice.org2.4/share/dict/ooo:/opt/openoffice.org2.3/share/dict/ooo:/usr/lib/openoffice.org2.3/share/dict/ooo:/opt/openoffice.org2.2/share/dict/ooo:/usr/lib/openoffice.org2.2/share/dict/ooo:/opt/openoffice.org2.1/share/dict/ooo:/usr/lib/openoffice.org2.1/share/dict/ooo:/opt/openoffice.org2.0/share/dict/ooo:/usr/lib/openoffice.org2.0/share/dict/ooo
AVAILABLE DICTIONARIES (path is not mandatory for -d option):
/usr/share/hunspell/en_US
LOADED DICTIONARY:
/usr/share/hunspell/en_US.aff
/usr/share/hunspell/en_US.dic
Hunspell 1.6.2
luism@lmm-notebook:~$ for i in are could did is must should was were would
> do sed -ne /${i}n\'t/'{p;q}' /usr/share/hunspell/en_US.dic
> done
aren't
couldn't
didn't
isn't
mustn't
shouldn't
wasn't
weren't
wouldn't
luism@lmm-notebook:~$ for i in are could did is must should was were would
> do hunspell <<EOF
> ${i}n't
> EOF
> done
Hunspell 1.6.2
& aren 12 0: earn, are, arena, Daren, Yaren, Karen, ares, area, amen, wren, 
Wren, are n
*

Hunspell 1.6.2
& couldn 2 0: could, could n
*

Hunspell 1.6.2
& didn 4 0: did, din, dido, did n
*

Hunspell 1.6.2
& isn 9 0: sin, ins, ism, is, in, inn, ion, isl, is n
*

Hunspell 1.6.2
& mustn 6 0: must, musts, musty, mus tn, mus-tn, must n
*

Hunspell 1.6.2
& shouldn 2 0: should, should n
*

Hunspell 1.6.2
& wasn 10 0: awns, was, wan, swan, wain, warn, wast, wasp, wash, was n
*

Hunspell 1.6.2
& weren 5 0: were, ween, wren, were n, wen
*

Hunspell 1.6.2
& wouldn 3 0: would, woulds, would n
*

luism@lmm-notebook:~$ for i in are could did is must should was were would
> do hunspell <<EOF
> ${i}n’t
> EOF
> done
Hunspell 1.6.2
& aren 12 0: earn, are, arena, Daren, Yaren, Karen, ares, area, amen, wren, 
Wren, are n
*

Hunspell 1.6.2
& couldn 2 0: could, could n
*

Hunspell 1.6.2
& didn 4 0: did, din, dido, did n
*

Hunspell 1.6.2
& isn 9 0: sin, ins, ism, is, in, inn, ion, isl, is n
*

Hunspell 1.6.2
& mustn 6 0: must, musts, musty, mus tn, mus-tn, must n
*

Hunspell 1.6.2
& shouldn 2 0: should, should n
*

Hunspell 1.6.2
& wasn 10 0: awns, was, wan, swan, wain, warn, wast, wasp, wash, was n
*

Hunspell 1.6.2
& weren 5 0: were, ween, wren, were n, wen
*

Hunspell 1.6.2
& wouldn 3 0: would, woulds, would n
*

```

According to the [`hunspell` 
changelog](https://github.com/hunspell/hunspell/blob/master/ChangeLog), with 
appropriate dictionaries, hunspell should accept ' inside words.
> 2014-05-28 Németh László <nemeth at numbertext dot org>:

…
>       * better apostrophe usage:
>       - WORDCHARS only with one of the Unicode or ASCII apostrophe
>         results extended word tokenization: both of them will be part of
>         the words (if they are inside: eg. word's, but not words').
>       - convert Unicode apostrophes to ASCII ones for 8-bit dictionaries
>         (eg. English dictionaries), or for UTF-8 dictionaries only
>         with ASCII apostrophe supports (eg. French dictionaries).

Therefore, I raise the issue here, since dictionary's affix rules don't appear 
to support the hunspell feature.
The en_US dictionary (and others) should allow hunspell to process words 
containing ' without breaking them.

** Affects: scowl (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Desktop
Packages, which is subscribed to scowl in Ubuntu.
https://bugs.launchpad.net/bugs/1807103

Title:
  en_US dictionary misses n't contractions

Status in scowl package in Ubuntu:
  New

Bug description:
  Despite their presence in the `.dic` file, `hunspell` breaks some 
contractions at ' (ASCII apostrophe) or ’ (Unicode apostrophe) and rejects 
resulting non-words as misspellings.
  ```ShellSession
  luism@lmm-notebook:~$ lsb_release -rd
  Description:  Ubuntu 18.10
  Release:      18.10
  luism@lmm-notebook:~$ apt list --installed hunspell hunspell-en-us
  Listing... Done
  hunspell-en-us/cosmic,now 1:2018.04.16-1 all [installed]
  hunspell/cosmic,now 1.6.2-1build1 amd64 [installed]
  luism@lmm-notebook:~$ hunspell -D
  SEARCH PATH:
  
.::/usr/share/hunspell:/usr/share/myspell:/usr/share/myspell/dicts:/Library/Spelling:/home/luism/.openoffice.org/3/user/wordbook:/home/luism/.openoffice.org2/user/wordbook:/home/luism/.openoffice.org2.0/user/wordbook:/home/luism/Library/Spelling:/opt/openoffice.org/basis3.0/share/dict/ooo:/usr/lib/openoffice.org/basis3.0/share/dict/ooo:/opt/openoffice.org2.4/share/dict/ooo:/usr/lib/openoffice.org2.4/share/dict/ooo:/opt/openoffice.org2.3/share/dict/ooo:/usr/lib/openoffice.org2.3/share/dict/ooo:/opt/openoffice.org2.2/share/dict/ooo:/usr/lib/openoffice.org2.2/share/dict/ooo:/opt/openoffice.org2.1/share/dict/ooo:/usr/lib/openoffice.org2.1/share/dict/ooo:/opt/openoffice.org2.0/share/dict/ooo:/usr/lib/openoffice.org2.0/share/dict/ooo
  AVAILABLE DICTIONARIES (path is not mandatory for -d option):
  /usr/share/hunspell/en_US
  LOADED DICTIONARY:
  /usr/share/hunspell/en_US.aff
  /usr/share/hunspell/en_US.dic
  Hunspell 1.6.2
  luism@lmm-notebook:~$ for i in are could did is must should was were would
  > do sed -ne /${i}n\'t/'{p;q}' /usr/share/hunspell/en_US.dic
  > done
  aren't
  couldn't
  didn't
  isn't
  mustn't
  shouldn't
  wasn't
  weren't
  wouldn't
  luism@lmm-notebook:~$ for i in are could did is must should was were would
  > do hunspell <<EOF
  > ${i}n't
  > EOF
  > done
  Hunspell 1.6.2
  & aren 12 0: earn, are, arena, Daren, Yaren, Karen, ares, area, amen, wren, 
Wren, are n
  *

  Hunspell 1.6.2
  & couldn 2 0: could, could n
  *

  Hunspell 1.6.2
  & didn 4 0: did, din, dido, did n
  *

  Hunspell 1.6.2
  & isn 9 0: sin, ins, ism, is, in, inn, ion, isl, is n
  *

  Hunspell 1.6.2
  & mustn 6 0: must, musts, musty, mus tn, mus-tn, must n
  *

  Hunspell 1.6.2
  & shouldn 2 0: should, should n
  *

  Hunspell 1.6.2
  & wasn 10 0: awns, was, wan, swan, wain, warn, wast, wasp, wash, was n
  *

  Hunspell 1.6.2
  & weren 5 0: were, ween, wren, were n, wen
  *

  Hunspell 1.6.2
  & wouldn 3 0: would, woulds, would n
  *

  luism@lmm-notebook:~$ for i in are could did is must should was were would
  > do hunspell <<EOF
  > ${i}n’t
  > EOF
  > done
  Hunspell 1.6.2
  & aren 12 0: earn, are, arena, Daren, Yaren, Karen, ares, area, amen, wren, 
Wren, are n
  *

  Hunspell 1.6.2
  & couldn 2 0: could, could n
  *

  Hunspell 1.6.2
  & didn 4 0: did, din, dido, did n
  *

  Hunspell 1.6.2
  & isn 9 0: sin, ins, ism, is, in, inn, ion, isl, is n
  *

  Hunspell 1.6.2
  & mustn 6 0: must, musts, musty, mus tn, mus-tn, must n
  *

  Hunspell 1.6.2
  & shouldn 2 0: should, should n
  *

  Hunspell 1.6.2
  & wasn 10 0: awns, was, wan, swan, wain, warn, wast, wasp, wash, was n
  *

  Hunspell 1.6.2
  & weren 5 0: were, ween, wren, were n, wen
  *

  Hunspell 1.6.2
  & wouldn 3 0: would, woulds, would n
  *

  ```

  According to the [`hunspell` 
changelog](https://github.com/hunspell/hunspell/blob/master/ChangeLog), with 
appropriate dictionaries, hunspell should accept ' inside words.
  > 2014-05-28 Németh László <nemeth at numbertext dot org>:

  …
  >     * better apostrophe usage:
  >     - WORDCHARS only with one of the Unicode or ASCII apostrophe
  >       results extended word tokenization: both of them will be part of
  >       the words (if they are inside: eg. word's, but not words').
  >     - convert Unicode apostrophes to ASCII ones for 8-bit dictionaries
  >       (eg. English dictionaries), or for UTF-8 dictionaries only
  >       with ASCII apostrophe supports (eg. French dictionaries).

  Therefore, I raise the issue here, since dictionary's affix rules don't 
appear to support the hunspell feature.
  The en_US dictionary (and others) should allow hunspell to process words 
containing ' without breaking them.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/scowl/+bug/1807103/+subscriptions

-- 
Mailing list: https://launchpad.net/~desktop-packages
Post to     : desktop-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~desktop-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to