I have not understood everything yet ... But I trust you. But if you have the courage to explain to me I want to learn :) What I don't understand is how you can find the marker of each verse and chapter in the utf8 text? What is this marker in question?
Il 15/05/2019 19:03, David Haslam ha scritto: > Michael’s description matches how I imagined the method > during my waking moments this morning. :) > > David > > Sent from ProtonMail Mobile > > > On Wed, May 15, 2019 at 17:33, Michael H <cma...@gmail.com > <mailto:cma...@gmail.com>> wrote: >> I've been working long hours and emailing in my break time. David >> has the basics of converting to VPL. >> >> I would then make the entire work a column in a spreadsheet. >> >> Then in other collumns insert a list of Book/chapter/verse in order. >> >> The BCV and versetext columns should align and can be verified, and >> adjusted where things don't match perfectly, like maybe 3 John has 15 >> instead of 14 verses. >> >> Once the columns align, you can merge them into another column via >> concatenation operations (&). This last column becomes your output. >> >> The output needs to consider that section titles and section ranges >> belong in front of the verse marker. That is a bit more complex >> search and replace, but can be done successfully. >> >> >> >> On Wed, May 15, 2019 at 11:12 AM David Haslam <dfh...@protonmail.com >> <mailto:dfh...@protonmail.com>> wrote: >> >> The attachment contains a counted list of Myanmar words >> containing a font conversion error. >> /NB. We need to match these words with what they are in the >> legacy font./ >> >> This issue should be discussed with the current maintainer of the >> SIL *TECkit* converter, whoever that may be. >> >> It may be worthwhile asking our friends at the SIL *Writing >> Systems Technology* team. See >> https://scripts.sil.org/default >> >> /Aside: My friend Martin Hosken of SIL knew the late Keith >> Stribley - the former webmaster of ThanLwinSoft./ >> >> Best regards, >> >> David >> >> Sent with ProtonMail <https://protonmail.com> Secure Email. >> >> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ >> On Wednesday, May 15, 2019 4:41 PM, David Haslam >> <dfh...@protonmail.com <mailto:dfh...@protonmail.com>> wrote: >> >>> _*Observations*: (continued)_ >>> >>> 5. The string "*Kd;*" also looks anomalous. It's found only once in >>> ကိုယ်တော်၏ဦးခေါင်းတော်အပေါ်၌ လည်း ဤသူသည်ကား ဂျူးလူမျ Kd;တို့၏ဘုရင်၊ >>> >>> 6. It's evident from the PDF file that the text is paragraphed >>> with indented first lines. See >>> >>> https://www.dropbox.com/s/do5e675i19xfomf/Screenshot%202019-05-15%2016.29.10.png?dl=0 >>> >>> My hunch is that these leading paragraph indents may have been >>> coded within contents.xml as the self-closing >>> element *<text:tab/>*. There are 372 matches to this. >>> >>> So not only do we need to provide chapter and verse tags (plus >>> section headings & parallel passage titles, etc), we also need >>> to reconstruct all the paragraph tags. >>> >>> /NB. All structural XML indents were removed by the filter >>> "Remove blanks at SOL" in the file /*/contents.pp.tx/*/that >>> was output by my simple TextPipe filter. So that's quite a >>> different matter./ >>> >>> Best regards, >>> >>> David >>> >>> Sent with ProtonMail <https://protonmail.com> Secure Email. >>> >>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ >>> On Wednesday, May 15, 2019 2:22 PM, David Haslam >>> <dfh...@protonmail.com <mailto:dfh...@protonmail.com>> wrote: >>> >>>> _*Observations:* (continued*)*_ >>>> >>>> 4. In addition to the reported instances of the anomalous 3 >>>> characters (*È,Ø,ò*) found after the font conversion, >>>> there are 6 instances of the string "*m;*" that are >>>> also probably due to bugs in the converter. >>>> >>>> Best regards, >>>> >>>> David >>>> >>>> Sent with ProtonMail <https://protonmail.com> Secure Email. >>>> >>>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ >>>> On Wednesday, May 15, 2019 12:41 PM, David Haslam >>>> <dfh...@protonmail.com <mailto:dfh...@protonmail.com>> wrote: >>>> >>>>> Yep - sure - later I can do that. >>>>> >>>>> David >>>>> >>>>> Sent from ProtonMail Mobile >>>>> >>>>> >>>>> On Wed, May 15, 2019 at 11:26, Cyrille <lafricai...@gmail.com >>>>> <mailto:lafricai...@gmail.com>> wrote: >>>>>> David I have no count in box, and I want not to create one. >>>>>> Can you push on https://framadrop.org/ it's totally free and >>>>>> secure (and private). >>>>>> Thank you. >>>>>> >>>>>> >>>>>> Il 15/05/2019 11:46, David Haslam ha scritto: >>>>>>> Interim progress report. >>>>>>> >>>>>>> I downloaded the file Mat_utf8.zip from Cyrille's link and unzipped >>>>>>> the contents to Mat_utf8-odt >>>>>>> >>>>>>> I opened the .odt file using 7-Zip from the Windows Explorer >>>>>>> context menu, and extracted the file contents.xml >>>>>>> >>>>>>> I used Notepad++ plug-in XMLTools to pretty print the XML file and >>>>>>> saved it as contents.pp.xml >>>>>>> This is simply a layout change that's easier to read. >>>>>>> >>>>>>> I viewed the .pp.xml file in BabelPad, which confirmed that the >>>>>>> non-XML text was (mostly) Myanmar Unicode. >>>>>>> >>>>>>> I used a TextPipe filter to remove all XML tags, blanks from SOL & >>>>>>> EOL and all blank lines. >>>>>>> The output file is now contents.pp.txt >>>>>>> >>>>>>> This is now something that's readable content in Myanmar Unicode, >>>>>>> with some English text such as "The Gospel according Matthew" near the >>>>>>> start. >>>>>>> >>>>>>> The file is best viewed using BabelPad with the option Display >>>>>>> Colours | Colour Code by Script. >>>>>>> This shows Myanmar characters in light green, and non-Myanmar >>>>>>> characters in other colours. >>>>>>> >>>>>>> Observations: >>>>>>> 1. The font conversion to Unicode left a few scattered characters >>>>>>> unconverted. :( >>>>>>> >>>>>>> 0000C8 È 18 LATIN CAPITAL LETTER E WITH GRAVE >>>>>>> 0000D8 Ø 20 LATIN CAPITAL LETTER O WITH STROKE >>>>>>> 0000F2 ò 3 LATIN SMALL LETTER O WITH GRAVE >>>>>>> >>>>>>> The complete character frequency analysis is attached. >>>>>>> >>>>>>> 2. A few verse numbers? are still present here and there. >>>>>>> 3. The content contains section headings and parallel passage >>>>>>> headings as well as verse text. >>>>>>> >>>>>>> I have just uploaded the file contents.pp.zip to a new folder in my >>>>>>> Box account and added Cyrille & Michael as viewers. >>>>>>> >>>>>>> >>>>>>> Best regards, >>>>>>> >>>>>>> David >>>>>>> >>>>>>> Sent with ProtonMail Secure Email. >>>>>>> >>>>>>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ >>>>>>> On Monday, May 13, 2019 9:19 AM, Cyrille <lafricai...@gmail.com> >>>>>>> <mailto:lafricai...@gmail.com> wrote: >>>>>>> >>>>>>> >>>>>>>> Hello, >>>>>>>> I recently receive a modern translation of Myanmar of the NT, >>>>>>>> Psalms and >>>>>>>> Proverbs with permission to create a new module. >>>>>>>> But the problems are many... Firs to get the text. >>>>>>>> I tested different way, but it's done with PageMaker! >>>>>>>> I can get the text but the problem is I don't have the verses >>>>>>>> number >>>>>>>> because they are next in a parallel column and when I copy it I >>>>>>>> have >>>>>>>> only the biblical text. >>>>>>>> I have a pdf also but when I convert it to text (with pdftotext) >>>>>>>> the >>>>>>>> columns are mixed. >>>>>>>> Someone can help me whit any idea? >>>>>>>> Next problem is the Unicode... The text is not typed in unicode >>>>>>>> but use >>>>>>>> a special font. >>>>>>>> I can send everything you need or push it the git.crosswire. >>>>>>>> >>>>>>>> Thanks for help. >>>>>>>> >>>>>>>> sword-devel mailing list: sword-devel@crosswire.org >>>>>>>> <mailto:sword-devel@crosswire.org> >>>>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel >>>>>>>> Instructions to unsubscribe/change your settings at above page >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> sword-devel mailing list: sword-devel@crosswire.org >>>>>>> <mailto:sword-devel@crosswire.org> >>>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel >>>>>>> Instructions to unsubscribe/change your settings at above page >>>>> >>>>> >>>> >>> >> >> _______________________________________________ >> sword-devel mailing list: sword-devel@crosswire.org >> <mailto:sword-devel@crosswire.org> >> http://www.crosswire.org/mailman/listinfo/sword-devel >> Instructions to unsubscribe/change your settings at above page >> > > > > _______________________________________________ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page
_______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page