Several questions re XSLT: How can I select on (bits of/features of) content of the actual text node - e.g. the text being capitalised?
How can I print out an attribute? How can I avoid the text content being printed while still working on the children nodes? I think i have made decent progress on this text, but there were long periods of inactivity until I understood the next steps. FWIW my process: I have received PDF files, no better source exists. I have used pdf2xml to create XML expressions of the underlying post script. I then have fairly painstakingly analysed the font size and other characteristics to decide which bit represents which structure. A perl script produces now a xml file based on above. This XML file is still ordered along pages and as a print layout, without any deeper hierarchy, so no actual textual structure. But at least the structure becomes perceivable in my naming of tags. I then take an XSLT sheet to create USFM from the text. This is closer to the structureless text than OSIS. I finally need another Perl script to clean things up a bit. (not yet written), but that will be straight forward. Thanks Peter _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page