Yiguang,
Have you tried to open your imp file with a text editor that
understands UTF-8? vi (vim) should do fine. If the data in the imp
file looks ok (and indeed is UTF-8) then you should be good to go for
sword. Actually, a web browser might be easiest. Just open your imp
file with firefox and manually select the UTF-8 encoding and see if it
show up ok.
Hope we can get things working well for you,
-Troy.
Yiguang Hu wrote:
Thanks Chris. I used UTF-8 in the .conf file. It
didn't work.
By trying different encoding, I mean I tried to use
word and text editor to read the geneated LD database
(the *.dat file) by select different encodings(GB2312,
BIG5, UTF-8, CN2202, etc), none of them work.
I am not familiar with C lanuage (I assume impl2ld is
a c program) since I have not coded it for several
years, so I don't know if there were potential hidden
conversion that took place. Java does has some hidden
conversion if encodings are not specified correctly.
The heart of that problem is:
String str=new String(byte[],ENCODING)/ str=new
String(byte[]);
and byte[] bt=str.getByte(ENCODING)/bt=str.getByte().
If ENCODING is not specified, the default encoding is
picked up according to the JVM environment and it will
corrupt data if the default encoding is ASCII(for
example en_US locale) while the data were actually
DBCS or MBCS characters like Chinese encoded in no
matter what encodings. The above conversion is very
common is JAVA and could cause problems, for example
during conversing stream bytes into string or writing
string to file using stream.
Could there be similar issue in C/C++ ?
Thanks
Yiguang
--- Chris Little <[EMAIL PROTECTED]> wrote:
imp2ld faithfully converts an IMP file to an LD
database. There is no
text encoding transformation of the data involved,
so what you put in
your file is exactly what will be placed in the
module and is exactly
what you will get back (from a front-end or
mod2imp).
The invalid character warning can be ignored. The
only character
transformations that imp2ld performs relate to
sorting the dictionary
keys, so the worst case would involve entries in the
wrong order.
(Correct me if I'm wrong about this Troy.)
I'm not sure what you meant about trying different
encodings. Which
values did you try? The .conf file for your module
should include a line
that says "Encoding=UTF-8" if you have UTF-8 input.
--Chris
Yiguang Hu wrote:
I ran into Encoding problem when I tried to use
imp2ld
to convert a Chinese theology terms/Encyclopedia
into
the module
that sword can use. The input text file is a UTF-8
encoded with the format:
$$$English KeyWord Chinese Translation
The meaning of the term
$$$....
For example:
$$$Abbess 女修道院長
 為女修道院之女領袖,其職任不如男修道院長設立之早,其權亦不如男修道院長之大。有時亦管理男修道院。
$$$Abbey 修道院
 又稱*Monastery。原為一修道士團之名稱,由一位院長管理。以後他們所居住之屋宇、禮拜堂等,概稱為修道院。
$$$Abbot 修道院長
 為修道院領袖之稱,意即父也。修道院長原係平信徒,從第七世紀起,教會定為聖職。通常為其本院弟兄所選舉,其職任乃終身。
$$$Abbot, George
阿波特(1562-1633)
 英國教宗;坎特布里大主教;*聖經欽定本的合編者。
$$$Abelard, Peter or Abailard
亞比拉(1079-1142)
I used imp2ld to generate the module. There were
many
errors about invalid characters. But it
neverthless
generated the module. The problem is the module
characters are saved in wrong encoding. I tried
different encodings to read and none of them make
the
charater understandable as shown below:
Abbey 修�院
 å�ˆç¨±*Monastery。原為一修é�“士團之å��稱,由一ä½�院長管ç�†ã€‚以後他們所居ä½�之屋宇ã€�禮拜å
‚ç‰ï¼Œæ¦‚稱為修é�“院。
Abbot 修�院長
 為修é�“院é
˜è¢–之稱,æ„�å�³çˆ¶ä¹Ÿã€‚ä¿®é�“院長原係平信徒,從第七世紀起,教會定為è�–è�·ã€‚通常為其本院弟兄所é�¸èˆ‰ï¼Œå…¶è�·ä»»ä¹ƒçµ‚身。
Abbot, George 阿波特(1562-1633)
Does anyone experience this and knows how to solve
this problem?
BTW, I have a couple of short java programs that
generate the above format Dictionary file and
Bible
text so you can use impl2vs and impl2ld to convert
them into sword modules. I will be glad to put the
code some where for share if someone interest in
it.
Thanks
Yiguang
__________________________________
Yahoo! FareChase: Search multiple travel sites in one click.
http://farechase.yahoo.com
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page