Yiguang,
Have you tried to open your imp file with a text editor that understands UTF-8? vi (vim) should do fine. If the data in the imp file looks ok (and indeed is UTF-8) then you should be good to go for sword. Actually, a web browser might be easiest. Just open your imp file with firefox and manually select the UTF-8 encoding and see if it show up ok.
        Hope we can get things working well for you,
                -Troy.



Yiguang Hu wrote:
Thanks Chris. I used UTF-8 in the .conf file. It
didn't work.
By trying different encoding, I mean I tried to use
word and text editor to read the geneated LD database
(the *.dat file) by select different encodings(GB2312,
BIG5, UTF-8, CN2202, etc), none of them work.

I am not familiar with C lanuage (I assume impl2ld is
a c program) since I have not coded it for several
years, so I don't know if there were potential hidden
conversion that took place. Java does has some hidden
conversion if encodings are not specified correctly.
The heart of that problem is:
String str=new String(byte[],ENCODING)/ str=new
String(byte[]);
and byte[] bt=str.getByte(ENCODING)/bt=str.getByte().
If ENCODING is not specified, the default encoding is
picked up according to the JVM environment and it will
corrupt data if the default encoding is ASCII(for
example en_US locale) while the data were actually
DBCS or MBCS characters like Chinese encoded in no
matter what encodings. The above conversion is very
common is JAVA and could cause problems, for example
during conversing stream bytes into string or writing
string to file using stream.

Could there be similar issue in C/C++ ?

Thanks
Yiguang


--- Chris Little <[EMAIL PROTECTED]> wrote:


imp2ld faithfully converts an IMP file to an LD
database. There is no text encoding transformation of the data involved, so what you put in your file is exactly what will be placed in the module and is exactly what you will get back (from a front-end or
mod2imp).

The invalid character warning can be ignored. The
only character transformations that imp2ld performs relate to sorting the dictionary keys, so the worst case would involve entries in the wrong order. (Correct me if I'm wrong about this Troy.)

I'm not sure what you meant about trying different
encodings. Which values did you try? The .conf file for your module should include a line that says "Encoding=UTF-8" if you have UTF-8 input.

--Chris


Yiguang Hu wrote:

I ran into Encoding problem when I tried to use

imp2ld

to convert a Chinese theology terms/Encyclopedia

into

the module
that sword can use. The input text file is a UTF-8
encoded with the format:
$$$English KeyWord Chinese Translation
The meaning of the term
$$$....
For example:
$$$Abbess &#22899;&#20462;&#36947;&#38498;&#38263;



&#12288;&#28858;&#22899;&#20462;&#36947;&#38498;&#20043;&#22899;&#38936;&#34966;&#65292;&#20854;&#32887;&#20219;&#19981;&#22914;&#30007;&#20462;&#36947;&#38498;&#38263;&#35373;&#31435;&#20043;&#26089;&#65292;&#20854;&#27402;&#20134;&#19981;&#22914;&#30007;&#20462;&#36947;&#38498;&#38263;&#20043;&#22823;&#12290;&#26377;&#26178;&#20134;&#31649;&#29702;&#30007;&#20462;&#36947;&#38498;&#12290;

$$$Abbey &#20462;&#36947;&#38498;



&#12288;&#21448;&#31281;*Monastery&#12290;&#21407;&#28858;&#19968;&#20462;&#36947;&#22763;&#22296;&#20043;&#21517;&#31281;&#65292;&#30001;&#19968;&#20301;&#38498;&#38263;&#31649;&#29702;&#12290;&#20197;&#24460;&#20182;&#20497;&#25152;&#23621;&#20303;&#20043;&#23627;&#23431;&#12289;&#31150;&#25308;&#22530;&#31561;&#65292;&#27010;&#31281;&#28858;&#20462;&#36947;&#38498;&#12290;

$$$Abbot &#20462;&#36947;&#38498;&#38263;



&#12288;&#28858;&#20462;&#36947;&#38498;&#38936;&#34966;&#20043;&#31281;&#65292;&#24847;&#21363;&#29238;&#20063;&#12290;&#20462;&#36947;&#38498;&#38263;&#21407;&#20418;&#24179;&#20449;&#24466;&#65292;&#24478;&#31532;&#19971;&#19990;&#32000;&#36215;&#65292;&#25945;&#26371;&#23450;&#28858;&#32854;&#32887;&#12290;&#36890;&#24120;&#28858;&#20854;&#26412;&#38498;&#24351;&#20804;&#25152;&#36984;&#33289;&#65292;&#20854;&#32887;&#20219;&#20035;&#32066;&#36523;&#12290;

$$$Abbot, George
&#38463;&#27874;&#29305;&#65288;1562-1633&#65289;



&#12288;&#33521;&#22283;&#25945;&#23447;&#65307;&#22350;&#29305;&#24067;&#37324;&#22823;&#20027;&#25945;&#65307;*&#32854;&#32147;&#27453;&#23450;&#26412;&#30340;&#21512;&#32232;&#32773;&#12290;

$$$Abelard, Peter or Abailard
&#20126;&#27604;&#25289;&#65288;1079-1142&#65289;

I used imp2ld to generate the module. There were

many

errors about invalid characters. But it

neverthless

generated the module. The problem is the module
characters are saved in wrong encoding. I tried
different encodings to read and none of them make

the

charater understandable as shown below:
Abbey 修�院



 å�ˆç¨±*Monastery。原為一修é�“士團之å��稱,由一ä½�院長管ç�†ã€‚以後他們所居ä½�之屋宇ã€�禮拜å

‚等,概稱為修é�“院。

Abbot 修�院長

 為修é�“院é

˜è¢–之稱,æ„�å�³çˆ¶ä¹Ÿã€‚ä¿®é�“院長原係平信徒,從第七世紀起,教會定為è�–è�·ã€‚通常為其本院弟兄所é�¸èˆ‰ï¼Œå…¶è�·ä»»ä¹ƒçµ‚身。

Abbot, George 阿波特(1562-1633)

Does anyone experience this and knows how to solve
this problem?

BTW, I have a couple of short java programs that
generate the above format Dictionary file and

Bible

text so you can use impl2vs and impl2ld to convert
them into sword modules. I will be glad to put the
code some where for share if someone interest in

it.

Thanks
Yiguang





                
__________________________________ Yahoo! FareChase: Search multiple travel sites in one click.
http://farechase.yahoo.com
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to