Public bug reported:

The problems reported here all relate to the file th_TH.aff, installed
from Version 1:3.2.0-3ubuntu3.1 of package myspell-th onto the Ubunut
system described as:

Description:    Ubuntu 10.04.3 LTS
Release:        10.04

1. The line 'SET TIS620-2533' needs to read 'SET TIS620-2533' for
Hunspell to work with iconv - at present Hunspell issues the error
message  'error - iconv_open: UTF-8 -> TIS620-2533' (or error -
iconv_open: tis620 -> TIS620-2533 if the interface is specified as -i
tis620).   When the correction is made, then the command

$ (echo '\!'; echo '-'; echo สะกัด อไร หณา) | /usr/bin/hunspell -d
~/spell/th_TH | more

reasonably generates the output

Hunspell 1.2.8


& สะกัด 4 0: สะกิด, สะกดทัพ, สะกด, สะบัด
& อไร 4 6: อุไร, อะไร, ขอบไร, ฤร้
& หณา 4 10: อาณา, อุณา, สกุณา, ยฆษณา

Without the change, it generates the output (shown shorn of error messages):
Hunspell 1.2.8


# สะกัด 0
# อไร 16
# หณา 26

2. The affix file lacks a TRY line to list characters that might have
been omitted or utterly mistyped - all the suggestions above are
generated by the n-gram algorithm.  The acceptance criteria for n-gram
outputs is tightened by Version 1.3.2 of Hunspell, and so mistyping อไร
for อะไร will not be corrected.  A suitable addition would be:

# The try list excludes the 5 letters and marks ฅ๎ฃฺฦ, which are not used in
# the orthography of Central Thai.
# The consonants are ordered lower case native, upper case native,
# common Indic, other extras
TRY ก่ข้ค๊ง๋จ็ช์ซะดัตาถิทีนึบืปุผูฝเพแฟโมใยไรลวสหอฉญฮธณภฆฌฎฏฐฑฒฤๅษศฬ

(Note that the file th_TH.aff is in the TIS-620 encoding.)

3. There are further faults, some of which have already been fixed in
Hunspell Version 1.3.2, but they are all internal to Hunspell.  Unfixed
faults are recorded against Hunspell at:

http://sourceforge.net/tracker/?func=detail&aid=3178449&group_id=143754&atid=756395
https://sourceforge.net/tracker/?func=detail&aid=3468022&group_id=143754&atid=756395

Coding to correct them (which may need to be applied more widely) is recorded 
in 
https://sourceforge.net/tracker/?func=detail&aid=3468039&group_id=143754&atid=756395

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: myspell-th 1:3.2.0-3ubuntu3.1 [modified: usr/share/hunspell/th_TH.aff]
ProcVersionSignature: Ubuntu 2.6.32-37.81-generic 2.6.32.49+drm33.21
Uname: Linux 2.6.32-37-generic i686
Architecture: i386
Date: Sat Dec 31 17:58:55 2011
InstallationMedia: Ubuntu 10.04.1 LTS "Lucid Lynx" - Release i386 (20100816.1)
PackageArchitecture: all
ProcEnviron:
 LANGUAGE=en_GB:en
 PATH=(custom, user)
 LANG=en_GB.utf8
 SHELL=/bin/bash
SourcePackage: openoffice.org-dictionaries

** Affects: openoffice.org-dictionaries (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: apport-bug i386 lucid

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/910447

Title:
  th_TH Affix File Inadequate for Hunspell

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/openoffice.org-dictionaries/+bug/910447/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to