Robin,
There are a few other issues relating to getting this to work.
Unicode allows for a decorated character to be a single code point,
called a composed character, or multiple code points for the letter
followed by it's decorations.
These are called NFC and NFD, respectively. There are two other ways
to represent unicode characters called NFKC and NFKD. For a good
description see: http://unicode.org/reports/tr15/ and http://unicode.org/faq/normalization.html
At CrossWire, we have settled on NFC. This appears to be the
recommendation of the w3c. See: http://www.crosswire.org/pipermail/sword-devel/2007-September/025896.html
At this time it is the module encoder's responsibility to encode the
module correctly. Later osis2mod (and perhaps some of the other
filters) will be changed to force the text to NFC.
Basically, you need to first run the text through a filter to that
does Canonical Decomposition and then through one that does Canonical
Composition. (The Sword filter utf8nfc.cpp does this)
Once that is done, make the module as you have always done.
The next step is to ensure that you have a font that can handle the
text. On Windows, I believe Arial should work. However, SIL has a
bunch of open source fonts which are excellent. See: http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=silfontlist
If it doesn't work in BibleCS, try BibleDesktop, FireFox and IE. (you
should be able to open the dat file if the module is not compressed)
Hope that helps.
In His Service,
DM
On Jan 19, 2008, at 11:47 PM, [EMAIL PROTECTED] wrote:
DM, Sabastien,
Thanks for your references on encoding. I have read about encodings
and Now I need a practical example of making it thru the module
process. Let's say I have an alpha with an accent (I use PSPad to
get the codes in right) and I have this in XXX.imp. When I bring
this into NotePad with UTF-8 as encoding type (Format is also Greek
script) it looks just fine. Then I run it thru "imp2ld XXX.imp XXX
2" to get XXX.dat and XXX.idx. No errors, no problems. The
XXX.conf file has Encoding=UTF-8. But when I fire up BibleCS and
look at XXX in the LD pane I see a box where I am expecting an
accented alpha. Unfortunately I know of no accented Greek text that
I can reverse engineer to see where I am going wrong. Without clear
answers at this point
I have resigned to include only unaccented Greek text. If there are
better tools out there to ensure I am on the right track please let
me know.
In His Grace,
Robin
>On Jan 18, 2008, at 5:24 AM, Sebastien Koechlin wrote:
>> On Thu, Jan 17, 2008 at 11:58:10PM -0500, [EMAIL PROTECTED] wrote:
>>>>> I'm trying to display Unicode Greek in RawLD ThML with 1.5.9
>>>>> BibleCS.
>>>>> Does anyone know what the .conf file should look like?
>>>>> "Encoding=Unicode or "Encoding=UNICODE" does NOT work. I just
>>>>> get open
>>>>> squares where the letters should have accents.
>>>
>>>> Should be UTF-8, "unicode" is usually for internal
representation
>>>> only
>>>> and "unicode" in itself is ambiguous.
>>
>> Unicode is not an encoding.
>>
>> As encoding is a common source of problems, I tried to write a
small
>> text
>> about it. As english is not my native language, someone should
>> probably
>> correct it.
>>
>> http://www.crosswire.org/wiki/index.php/Encoding
>
>I've added links to your excellent page from
http://www.crosswire.org/wiki/index.php/DevTools:Modules
> both in the section on Encoding and in Related Links.
E-mail: [EMAIL PROTECTED]
Start the year off right. Easy ways to stay in shape in the new year.
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page