Aloha, James:

great. thanks for this... seems we are each re-inventing the wheel here.

Your code is useful though.

I see you are still having to deal with the pesky "—" But only in your TOC xml processing routines.

What I don't understand (which makes it hard to make good strategic decisions moving forward)

Is that these ePub companies (Atritex in Chennai for us) to taking text from books (inDesign) and then output strings that are ( to my eyes) mixed. Take for example this file from the front matter of a book.. (see below)

If I take this raw text and push to a field using the code suggested by Mark

     put uniDecode(uniEncode(x,"UTF8")) into x
     set the htmlText of fld 2 to x

and *if* I'm very sure to set the field to a font like Arial Unicode MS or Helvetica Neue... it works "brilliantly" and I can even cut and paste to email or Indesign or pages or MSWord and all characters are rendered. "Wunderbar! Marvelous!"

But it make me nervous when I look at the raw code because we see unicode characters output as decimal entities

Tamil Letter U
HTML Entity (decimal)   உ

UTF-16 (hex)    0x0B89 (0b89)



(Line 1 here) உ Unicode expressed as a decimal entity. We don't see the script here...

(Line 4 here) ???????? # which is obviously "pure" unicode for Tamil language I don't know what encoding it is.. because I see the actual script... in the raw text..

mixed with curly quotes and mdashes which I assume are ANSI characters... because if open the file in BBEdit those characters are not encoded... they just appear as ' and ---

So we seem to have three different encodings.

I paste here a rendering from below which I copied out of my LC field... to Indesign.. then from Indesign to this email:

I wonder if you are seeing the unicode Tamil, all the diacritical marks and the mDash and curly quote... ?

Or do you see garbage when you open this post?

-------
????????
The thirty-six elements dance. Sada-s'iva dances. Consciousness dances. S'iva-S'akti dances. The animate and inanimate dance. All these and the Vedas dance when the Supreme dances His dance of bliss. The seven worlds as His golden abode, the five chakras as His pedestal, the central kun.d.alini- s'akti as His divine stage, thus in rapture He dances, He who is Transcendent Light. He dances with the celestials. He dances in the golden hall. He dances with the three Gods. He dances with the assembly of silent sages. He dances in song. He dances in ultimate energy. He dances in souls---He who is the Lord of Dances. Tat Astu.
-----------

everything appears to work. which is amazing...and means LC 7 is a huge step forward for us.

If this really will hold all the way thru a JSON encoded POST to MySQL and back out again to desktop client or mobile app without anything breaking.

We *can* dumb this all down to 0-127 (I used to do that years ago and have a whole stack dedicated to stripping all diacriticals, replacing ANSI chars etc according to our spelling/lexicon conventions... )

But if LC 7 can actually provide us a way to display all everything, and I can actually put this on a web page also... it will be a quantum leap forward for us.

Here is what is in the ePub... Maybe I really shouldn't worry about the different encodings at all? and just assume this will retain "integrity" through all processes, assuming the rendering context is using a unicode font?

<h4 class="h4g"><samp><small>&#2953; </small></samp></h4>
<h3 class="h3d"><samp><span class="cmbold"><samp>Dedication</samp></span></samp></h3>
<h4 class="h4g"><samp><em>Samarpa&#7751;am</em></samp></h4>
<h4 class="h4gg"><samp>????????</samp></h4>
<p class="noindent"> <span class="smallcapr"><samp>GA&#7750;E&#x15a;A, THE LORD OF CATEGORIES, WHO REMOVED ALL BARRIERS TO THE MANIFESTATION OF THIS CONTEMPORARY HINDU CATECHISM, TO HIM WE OFFER OUR REVERENT OBEISANCE. THIS TEXT IS DEDICATED TO MY <span class="cmitalic"><samp>SATGURU, </samp></span>SAGE YOGASWAMI</samp></span> of Columbuthurai, Sri Lanka, perfect <span class="cmitalic"><samp>siddha yog&#299; </samp></span>and illumined master who knew the Unknowable and held Truth in the palm of his hand. As monarch of the Nandin&#x101;tha Samprad&#x101;ya's Kail&#x101;sa Parampar&#x101;, this obedient disciple of Satguru Chellappaswami infused in me all that you will find herein. Yogaswami commanded all to seek within, to know the Self and see God &#x15a;iva everywhere and in everyone. Among his great sayings: "Know thy Self by thyself. &#x15a;iva is doing it all. All is &#x15a;iva. Be still." Well over 2,000 years ago Rishi Tirumular, of our lineage, aptly conveyed the spirit of <span class="cmitalic"><samp>Dancing with &#x15a;iva:</samp></span></samp></p> <p class="quote"><samp>The thirty-six elements dance. Sad&#x101;&#x15b;iva dances. Consciousness dances. &#x15a;iva-&#x15a;akti dances. The animate and inanimate dance. All these and the <span class="cmitalic"><samp>Vedas</samp></span> dance when the Supreme dances His dance of bliss. The seven worlds as His golden abode, the five chakras as His pedestal, the central <span class="cmitalic"><samp>ku&#x1e47;&#x1e0d;alin&#299; &#x15b;akti</samp></span> as His divine stage, thus in rapture He dances, He who is Transcendent Light. He dances with the celestials. He dances in the golden hall. He dances with the three Gods. He dances with the assembly of silent sages. He dances in song. He dances in ultimate energy. He dances in souls---He who is the Lord of Dances. Tat Astu. </samp></p>


--
Swasti Astu, Be Well!
Brahmanathaswami

Kauai's Hindu Monastery
www.HimalayanAcademy.com



James Hale wrote:
Hi Brahmanathaswami,

I wrote a sample stack that opens and displays pubs if that is of any use.

You can find it here...

http://livecodeshare.runrev.com/stack/761/Epub-Opener

If it is of help, let me know:-)

James
_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to