Il 15/05/2019 11:46, David Haslam ha scritto: > Interim progress report. > > I downloaded the file Mat_utf8.zip from Cyrille's link and unzipped the > contents to Mat_utf8-odt > > I opened the .odt file using 7-Zip from the Windows Explorer context menu, > and extracted the file contents.xml > > I used Notepad++ plug-in XMLTools to pretty print the XML file and saved it > as contents.pp.xml > This is simply a layout change that's easier to read. > > I viewed the .pp.xml file in BabelPad, which confirmed that the non-XML text > was (mostly) Myanmar Unicode. > > I used a TextPipe filter to remove all XML tags, blanks from SOL & EOL and > all blank lines. > The output file is now contents.pp.txt > > This is now something that's readable content in Myanmar Unicode, with some > English text such as "The Gospel according Matthew" near the start. > > The file is best viewed using BabelPad with the option Display Colours | > Colour Code by Script. > This shows Myanmar characters in light green, and non-Myanmar characters in > other colours. > > Observations: > 1. The font conversion to Unicode left a few scattered characters > unconverted. :( > > 0000C8 È 18 LATIN CAPITAL LETTER E WITH GRAVE > 0000D8 Ø 20 LATIN CAPITAL LETTER O WITH STROKE > 0000F2 ò 3 LATIN SMALL LETTER O WITH GRAVE Yes but this can be easily change. I can ask my friends with wich characters to change it (or have a look in the pdf). > The complete character frequency analysis is attached. > > 2. A few verse numbers? are still present here and there. > 3. The content contains section headings and parallel passage headings as > well as verse text. > > I have just uploaded the file contents.pp.zip to a new folder in my Box > account and added Cyrille & Michael as viewers. My question is, can you do something with the txt file for adding the verse number? > > Best regards, > > David > > Sent with ProtonMail Secure Email. > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > On Monday, May 13, 2019 9:19 AM, Cyrille <lafricai...@gmail.com> wrote: > >> Hello, >> I recently receive a modern translation of Myanmar of the NT, Psalms and >> Proverbs with permission to create a new module. >> But the problems are many... Firs to get the text. >> I tested different way, but it's done with PageMaker! >> I can get the text but the problem is I don't have the verses number >> because they are next in a parallel column and when I copy it I have >> only the biblical text. >> I have a pdf also but when I convert it to text (with pdftotext) the >> columns are mixed. >> Someone can help me whit any idea? >> Next problem is the Unicode... The text is not typed in unicode but use >> a special font. >> I can send everything you need or push it the git.crosswire. >> >> Thanks for help. >> >> sword-devel mailing list: sword-devel@crosswire.org >> http://www.crosswire.org/mailman/listinfo/sword-devel >> Instructions to unsubscribe/change your settings at above page > > > _______________________________________________ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page
_______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page