Public bug reported: Despite their presence in the `.dic` file, `hunspell` breaks some contractions at ' (ASCII apostrophe) or ’ (Unicode apostrophe) and rejects resulting non-words as misspellings. ```ShellSession luism@lmm-notebook:~$ lsb_release -rd Description: Ubuntu 18.10 Release: 18.10 luism@lmm-notebook:~$ apt list --installed hunspell hunspell-en-us Listing... Done hunspell-en-us/cosmic,now 1:2018.04.16-1 all [installed] hunspell/cosmic,now 1.6.2-1build1 amd64 [installed] luism@lmm-notebook:~$ hunspell -D SEARCH PATH: .::/usr/share/hunspell:/usr/share/myspell:/usr/share/myspell/dicts:/Library/Spelling:/home/luism/.openoffice.org/3/user/wordbook:/home/luism/.openoffice.org2/user/wordbook:/home/luism/.openoffice.org2.0/user/wordbook:/home/luism/Library/Spelling:/opt/openoffice.org/basis3.0/share/dict/ooo:/usr/lib/openoffice.org/basis3.0/share/dict/ooo:/opt/openoffice.org2.4/share/dict/ooo:/usr/lib/openoffice.org2.4/share/dict/ooo:/opt/openoffice.org2.3/share/dict/ooo:/usr/lib/openoffice.org2.3/share/dict/ooo:/opt/openoffice.org2.2/share/dict/ooo:/usr/lib/openoffice.org2.2/share/dict/ooo:/opt/openoffice.org2.1/share/dict/ooo:/usr/lib/openoffice.org2.1/share/dict/ooo:/opt/openoffice.org2.0/share/dict/ooo:/usr/lib/openoffice.org2.0/share/dict/ooo AVAILABLE DICTIONARIES (path is not mandatory for -d option): /usr/share/hunspell/en_US LOADED DICTIONARY: /usr/share/hunspell/en_US.aff /usr/share/hunspell/en_US.dic Hunspell 1.6.2 luism@lmm-notebook:~$ for i in are could did is must should was were would > do sed -ne /${i}n\'t/'{p;q}' /usr/share/hunspell/en_US.dic > done aren't couldn't didn't isn't mustn't shouldn't wasn't weren't wouldn't luism@lmm-notebook:~$ for i in are could did is must should was were would > do hunspell <<EOF > ${i}n't > EOF > done Hunspell 1.6.2 & aren 12 0: earn, are, arena, Daren, Yaren, Karen, ares, area, amen, wren, Wren, are n *
Hunspell 1.6.2 & couldn 2 0: could, could n * Hunspell 1.6.2 & didn 4 0: did, din, dido, did n * Hunspell 1.6.2 & isn 9 0: sin, ins, ism, is, in, inn, ion, isl, is n * Hunspell 1.6.2 & mustn 6 0: must, musts, musty, mus tn, mus-tn, must n * Hunspell 1.6.2 & shouldn 2 0: should, should n * Hunspell 1.6.2 & wasn 10 0: awns, was, wan, swan, wain, warn, wast, wasp, wash, was n * Hunspell 1.6.2 & weren 5 0: were, ween, wren, were n, wen * Hunspell 1.6.2 & wouldn 3 0: would, woulds, would n * luism@lmm-notebook:~$ for i in are could did is must should was were would > do hunspell <<EOF > ${i}n’t > EOF > done Hunspell 1.6.2 & aren 12 0: earn, are, arena, Daren, Yaren, Karen, ares, area, amen, wren, Wren, are n * Hunspell 1.6.2 & couldn 2 0: could, could n * Hunspell 1.6.2 & didn 4 0: did, din, dido, did n * Hunspell 1.6.2 & isn 9 0: sin, ins, ism, is, in, inn, ion, isl, is n * Hunspell 1.6.2 & mustn 6 0: must, musts, musty, mus tn, mus-tn, must n * Hunspell 1.6.2 & shouldn 2 0: should, should n * Hunspell 1.6.2 & wasn 10 0: awns, was, wan, swan, wain, warn, wast, wasp, wash, was n * Hunspell 1.6.2 & weren 5 0: were, ween, wren, were n, wen * Hunspell 1.6.2 & wouldn 3 0: would, woulds, would n * ``` According to the [`hunspell` changelog](https://github.com/hunspell/hunspell/blob/master/ChangeLog), with appropriate dictionaries, hunspell should accept ' inside words. > 2014-05-28 Németh László <nemeth at numbertext dot org>: … > * better apostrophe usage: > - WORDCHARS only with one of the Unicode or ASCII apostrophe > results extended word tokenization: both of them will be part of > the words (if they are inside: eg. word's, but not words'). > - convert Unicode apostrophes to ASCII ones for 8-bit dictionaries > (eg. English dictionaries), or for UTF-8 dictionaries only > with ASCII apostrophe supports (eg. French dictionaries). Therefore, I raise the issue here, since dictionary's affix rules don't appear to support the hunspell feature. The en_US dictionary (and others) should allow hunspell to process words containing ' without breaking them. ** Affects: scowl (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Desktop Packages, which is subscribed to scowl in Ubuntu. https://bugs.launchpad.net/bugs/1807103 Title: en_US dictionary misses n't contractions Status in scowl package in Ubuntu: New Bug description: Despite their presence in the `.dic` file, `hunspell` breaks some contractions at ' (ASCII apostrophe) or ’ (Unicode apostrophe) and rejects resulting non-words as misspellings. ```ShellSession luism@lmm-notebook:~$ lsb_release -rd Description: Ubuntu 18.10 Release: 18.10 luism@lmm-notebook:~$ apt list --installed hunspell hunspell-en-us Listing... Done hunspell-en-us/cosmic,now 1:2018.04.16-1 all [installed] hunspell/cosmic,now 1.6.2-1build1 amd64 [installed] luism@lmm-notebook:~$ hunspell -D SEARCH PATH: .::/usr/share/hunspell:/usr/share/myspell:/usr/share/myspell/dicts:/Library/Spelling:/home/luism/.openoffice.org/3/user/wordbook:/home/luism/.openoffice.org2/user/wordbook:/home/luism/.openoffice.org2.0/user/wordbook:/home/luism/Library/Spelling:/opt/openoffice.org/basis3.0/share/dict/ooo:/usr/lib/openoffice.org/basis3.0/share/dict/ooo:/opt/openoffice.org2.4/share/dict/ooo:/usr/lib/openoffice.org2.4/share/dict/ooo:/opt/openoffice.org2.3/share/dict/ooo:/usr/lib/openoffice.org2.3/share/dict/ooo:/opt/openoffice.org2.2/share/dict/ooo:/usr/lib/openoffice.org2.2/share/dict/ooo:/opt/openoffice.org2.1/share/dict/ooo:/usr/lib/openoffice.org2.1/share/dict/ooo:/opt/openoffice.org2.0/share/dict/ooo:/usr/lib/openoffice.org2.0/share/dict/ooo AVAILABLE DICTIONARIES (path is not mandatory for -d option): /usr/share/hunspell/en_US LOADED DICTIONARY: /usr/share/hunspell/en_US.aff /usr/share/hunspell/en_US.dic Hunspell 1.6.2 luism@lmm-notebook:~$ for i in are could did is must should was were would > do sed -ne /${i}n\'t/'{p;q}' /usr/share/hunspell/en_US.dic > done aren't couldn't didn't isn't mustn't shouldn't wasn't weren't wouldn't luism@lmm-notebook:~$ for i in are could did is must should was were would > do hunspell <<EOF > ${i}n't > EOF > done Hunspell 1.6.2 & aren 12 0: earn, are, arena, Daren, Yaren, Karen, ares, area, amen, wren, Wren, are n * Hunspell 1.6.2 & couldn 2 0: could, could n * Hunspell 1.6.2 & didn 4 0: did, din, dido, did n * Hunspell 1.6.2 & isn 9 0: sin, ins, ism, is, in, inn, ion, isl, is n * Hunspell 1.6.2 & mustn 6 0: must, musts, musty, mus tn, mus-tn, must n * Hunspell 1.6.2 & shouldn 2 0: should, should n * Hunspell 1.6.2 & wasn 10 0: awns, was, wan, swan, wain, warn, wast, wasp, wash, was n * Hunspell 1.6.2 & weren 5 0: were, ween, wren, were n, wen * Hunspell 1.6.2 & wouldn 3 0: would, woulds, would n * luism@lmm-notebook:~$ for i in are could did is must should was were would > do hunspell <<EOF > ${i}n’t > EOF > done Hunspell 1.6.2 & aren 12 0: earn, are, arena, Daren, Yaren, Karen, ares, area, amen, wren, Wren, are n * Hunspell 1.6.2 & couldn 2 0: could, could n * Hunspell 1.6.2 & didn 4 0: did, din, dido, did n * Hunspell 1.6.2 & isn 9 0: sin, ins, ism, is, in, inn, ion, isl, is n * Hunspell 1.6.2 & mustn 6 0: must, musts, musty, mus tn, mus-tn, must n * Hunspell 1.6.2 & shouldn 2 0: should, should n * Hunspell 1.6.2 & wasn 10 0: awns, was, wan, swan, wain, warn, wast, wasp, wash, was n * Hunspell 1.6.2 & weren 5 0: were, ween, wren, were n, wen * Hunspell 1.6.2 & wouldn 3 0: would, woulds, would n * ``` According to the [`hunspell` changelog](https://github.com/hunspell/hunspell/blob/master/ChangeLog), with appropriate dictionaries, hunspell should accept ' inside words. > 2014-05-28 Németh László <nemeth at numbertext dot org>: … > * better apostrophe usage: > - WORDCHARS only with one of the Unicode or ASCII apostrophe > results extended word tokenization: both of them will be part of > the words (if they are inside: eg. word's, but not words'). > - convert Unicode apostrophes to ASCII ones for 8-bit dictionaries > (eg. English dictionaries), or for UTF-8 dictionaries only > with ASCII apostrophe supports (eg. French dictionaries). Therefore, I raise the issue here, since dictionary's affix rules don't appear to support the hunspell feature. The en_US dictionary (and others) should allow hunspell to process words containing ' without breaking them. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/scowl/+bug/1807103/+subscriptions -- Mailing list: https://launchpad.net/~desktop-packages Post to : desktop-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~desktop-packages More help : https://help.launchpad.net/ListHelp