Thanks for this feedback, Troy!

Based on this I will check what’s needed to implement a StringMgr class that 
communicates with the node.js side to utilize the built-in JavaScript 
toUpperCase method.

Enjoy your Sunday!

Tobias

From: Troy A. Griffitts
Sent: Sonntag, 7. Februar 2021 18:02
To: sword-devel@crosswire.org
Subject: Re: [sword-devel] Sword Locales / German Umlaut Issues /AndroidBuild

Hi Tobias,
I don't believe processing the locale files directly will change your issues 
with with German umlauts.  The issue boils down to a few things:
First, there are other uses of upperUTF8 in the engine.  You show the cpp files 
in your grep, but not the .h files.
Second, VerseKey will not work correctly for any locale without a proper 
upperUTF8 implementation which supports that locale.
The issue is that verses references, freehand from outside or roundtripped from 
SWORD itself, still need to map to the uppercase representation of the book 
name.  If you look at all the locale files, the verse parsing table uses 
uppercase for all book abbreviations, so VerseKey's parser immediately 
uppercases the input string before it looks up the book in the table.
This is also true for LD module key lookups.  They are stored in uppercase.  
Technically it could work if both the module import tool and the display 
application were using the same StringMgr-- the default StringMgr would 
uppercase the string incorrectly, but consistently with the display 
application.  But this is not the case.  Our import tools use a proper 
StringMgr to create the modules with a proper uppercase key and thus the 
display application must also use a proper StringMgr or things will not work 
correctly.
I am afraid getting the data from the locales.d/ folders in JavaScript will not 
help fix the problem.
Troy

On 2/7/21 6:59 AM, Tobias Klein wrote:
Hi Troy,
Thanks once more for all the details! I appreciate it.

I just grepped quickly in the SWORD source code (grep -r "upperUTF8" .  | grep 
-v ".svn") and the method upperUTF8 appears to be only used in the following 
places:

./src/keys/versekey.cpp:                                
stringMgr->upperUTF8(abbr, (unsigned int)(strlen(abbr)*2));
./src/keys/versekey.cpp:                                        
stringMgr->upperUTF8(abbr, (unsigned int)(strlen(abbr)*2));
./utilities/imp2gbs.cpp:                
StringMgr::getSystemStringMgr()->upperUTF8(keyBuffer.getRawData(), size-2);
I think neither of those is currently used in Ezra Project, though. At the 
moment I do not have the use case to parse verse keys based on any special 
Unicode inputs. I am only using the standard English abbreviations for verse 
keys and that only happens internally. So, in this case I may just process the 
locales.d files directly in node.js / JavaScript.

Regarding node-sword-interface and the build process for mobile platforms ... 
currently I have only tried Android, which works fine. iOS should technically 
work as well, but I have not tried that yet. The boiler plate work to make all 
that happen smoothly is provided by the nodejs-mobile cordova plugin. That 
plugin contains build scripts that seemlessly compile any native node.js addons 
like node-sword-interface or also the sqlite3 module that I am using.

And since I am now using an API compatible runtime environment both for 
Electron/nodejs and Cordova/nodejs-mobile I did not have to add any additional 
glue code. One risk I see with this approach is that the guys who provide 
nodejs-mobile discontinue their work for some reason. It's essentially a 
completely separately maintained fork of nodejs (it has nothing to do with V8 
actually). Originally it is based on the ChakraCore JavaScript engine of the 
Microsoft Edge browser. But the nodejs-mobile guys ported it to Android and iOS 
...

Regarding the StringMgr native callback possibility ... yes technically this is 
possible with a node native addon like node-sword-interface.
I am using such a functionality for the InstallMgr and search progress 
feedbacks already.

So, long story short ... if in the future a usecase comes up to parse 
Unicode-based VerseKeys, I will implement a special StringMgr binding as you 
suggested. But for now I'll focus on handling the locales.d content directly in 
JavaScript / node.js.

I will keep you posted.

Best regards,
Tobias
On 2/6/21 11:59 PM, Troy A. Griffitts wrote:
The data is pulled from the locales.d/ files, but the toUpper logic is 
necessary in a number of places in the engine. Two come to mind immediately:
parsing verse references not sensitive to case
parsing LD module keys not sensitive to case
To be able to get an uppercase representation of any Unicode character, it 
takes a pretty hefty dataset of all known human languages-- that's why we leave 
it up to an external library.  And yeah, because ICU is so large, that's why I 
don't compile it into my binaries in Bishop.  Bishop is about 13MB total, which 
includes ~8MB of default module data (KJV, SME, StrongsGreek, StrongsHebrew).  
That's about 5MB for the app.  If I included ICU, it would greatly increase the 
size.  And both iOS and Android (Swift and Java) already have facilities for 
getting the toUpper of a string.
I hope you can steal the few lines from Bishop's native SWORD code which tells 
SWORD to call either Java or Swift when toUpperUTF is called.
I am sorry that this might break the nice ability to have exactly the same code 
on both iOS and Android (I am surprised that absolutely no changes were 
required for you to interface to a native library on both iOS and Android!  
cordova required me to provide: Android: Java-jni layer; iOS: Swift layer.  I 
am jealous.)
If you can think of an alternative, I am happy to listen.  We could provide a 
better StringMgr default (I think we simply have a latin-1 single byte 
tranformation table for basically ASCII characters), which includes an SW_u32 
hash which included German characters, but that's going to limit the languages 
we support to only the ones we add to our toUpper hash, and that's not really a 
dataset I want to maintain.
Open to suggestions,
Troy

On 2/6/21 2:56 PM, Tobias Klein wrote:
Dear Troy,
 
Thank you for these explanations! I appreciate it!
 
For Ezra Project on Android, I am at this point simply compiling 
node-sword-interface with the Android cross compilers and it works. However, as 
I wrote, I have issues for the German Bible book names now.
 
Is the StringMgr functionality only used to handle the locales.d files? Or also 
for some content inside any SWORD modules?


If it is only used for handling the locales.d files then I would consider 
handling the Sword locales.d files directly from JavaScript / node.js, which 
already supports Unicode.
 
I also checked whether I can cross-compile the ICU library and that worked, but 
this is a huge binary (I think 20-30 MB) and I would rather keep the APK size 
as small as possible.

Best regards,
Tobias
 
From: Troy A. Griffitts
Sent: Sonntag, 31. Januar 2021 18:20
To: sword-devel@crosswire.org
Subject: Re: [sword-devel] Sword Locales / German Umlaut Issues / AndroidBuild
 
Dear Tobias,
My apologies for taking so long to respond to this, but I wanted to give a 
thorough answer.  See the summary at the end if you don't care about the 
details.
So, SWORD has a class StringMgr, which manages strings within SWORD, and by 
default SWORD includes a very basic implementation, which doesn't necessarily 
know about or support anything beyond what the basic C string methods support.
I am sure this invokes a sense of horror from you at first, so let me explain a 
bit how we properly handle character sets.  First, short background: since we 
existed well before the Unicode world, we have multiple locale files for each 
language, which you will still see in the locales.d/ folder, each specifying 
their character encoding, and most of the time SWORD doesn't need to manipulate 
characters, so simply holding data, and passing that data to a display 
frontend, and specifying a font which will handle that encoding was enough in 
the old world.  IMPORTANT: the one place we do need to manipulate character 
data is to perform case-insensitive comparisons.  We did this in the past by 
converting a string to uppercase before comparison.  You'll notice this in the 
section for Bible book abbreviation in each locale-- the partial match key must 
be in a toupper state.
Today, everything in SWORD prefers Unicode and specifically, encoded as UTF-8.  
To support this:
First, we have utility functions within SWORD for working with Unicode encoded 
strings, see:
http://crosswire.org/svn/sword/trunk/include/utilstr.h
Specifically:
SWBuf assureValidUTF8(const char *buf);
SW_u32 getUniCharFromUTF8(const unsigned char **buf, bool skipValidation = 
false);
SWBuf *getUTF8FromUniChar(SW_u32 uchar, SWBuf *appendTo);
SWBuf utf8ToWChar(const char *buf);
SWBuf wcharToUTF8(const wchar_t *buf);
 
 
To wrap this up, by subclassing StringMgr, SWORD supports implementing 
character encoding by linking to other libraries, e.g., ICU, Qt, etc. to handle 
full Unicode support.  And while the StringMgr interface allow implementation 
of many string functions, upperUTF8 is the only real method the SWORD engine 
needs to work completely.  Some utilities use the other methods in there, but 
the engine, only needs this method.
 
In summary, on Android, you are likely not linking to ICU when you build the 
native SWORD binary-- which I don't do either for Bishop.  The Cordova SWORD 
plugin uses the SWORD java-jni bindings, which use the Java VM to implement 
StringMgr:
https://crosswire.org/svn/sword/trunk/bindings/java-jni/jni/swordstub.cpp 
Search for: AndroidStringMgr
And on iOS the Cordova plugin uses the Swift libraries to do the same.  This is 
done by using the SWORD flatapi call to 
org_crosswire_sword_StringMgr_setToUpper to provide a Swift implementation to 
uppercase a string. 
http://crosswire.org/svn/sword/trunk/bindings/cordova/cordova-plugin-crosswire-sword/src/ios/SWORD.swift
I hope this give you the information you need to get things working for you.  
Please don't hesitate to ask if you need help,
Troy
 
On 1/17/21 11:59 AM, Tobias Klein wrote:
Dear Troy, 

I'm playing with an Android Build of Sword and I get issues with the German 
Umlauts. 

So I have issues with Bible book names like Römer, Könige, etc. 

The Umlauts are shown as ?. 

I'm configuring the SWORD build with CMake like below (without ICU!) 

I remember having similar issues on Linux when building without ICU. 

How do you build SWORD for Bishop? Any suggestions? 

Best regards, 
Tobias 

-- Check for working CXX compiler: 
/opt/Android/SDK/ndk/r21b/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++
-- Check for working CXX compiler: 
/opt/Android/SDK/ndk/r21b/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++ -- 
works 
-- Detecting CXX compiler ABI info 
-- Detecting CXX compiler ABI info - done 
-- Detecting CXX compile features 
-- Detecting CXX compile features - done 
-- Check for working C compiler: 
/opt/Android/SDK/ndk/r21b/toolchains/llvm/prebuilt/linux-x86_64/bin/clang
-- Check for working C compiler: 
/opt/Android/SDK/ndk/r21b/toolchains/llvm/prebuilt/linux-x86_64/bin/clang -- 
works 
-- Detecting C compiler ABI info 
-- Detecting C compiler ABI info - done 
-- Detecting C compile features 
-- Detecting C compile features - done 
-- Configuring your system to build libsword. 
-- SWORD Version 1008900000 
 


_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to