Hi,

The data files for libexttextcat in this directory:

https://github.com/giuliopaci/libexttextcat/tree/master/langclass/ShortTexts

Contains a garbled Hungarian version, it's almost in iso-8859-1 but some
characters are destroyed because it doesn't contain all Hungarian
characters.

It is easy to pick up a utf-8 good version from

http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=hng

and see the difference.

It's not clear whether this prevents it from classifying Hungarian text
correctly, but it may stop it working in utf-8, because most of the other
files are in utf-8.

Cheers

Mark
_______________________________________________
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice

Reply via email to