ne 17. 3. 2019 v 19:57 odesílatel Ross Moore <ross.mo...@mq.edu.au> napsal: > > Hi Andrew, > > On 18/03/2019, at 0:18, "Andrew Cunningham" <lang.supp...@gmail.com> wrote: > > Ross, > > It is also dependent in the fonts themselves and the scripts the language is > written in. > > > Absolutely. > > Depending on the language and script the only way to ensure accessibility is > to include the ActualText attributes for each relevant tag. > > > Indeed, provided you have supplied tagging at all, as of course should be > done. > > Considering how complex opentype fonts can become for some scripts the > simplistic To Unicode mappings in a PDF can be insufficient. > > > Yes, but it is better for the CMaps to at least be appropriate, rather than > inaccurate or missing altogether, as can be the case. Different software > tools get information from different places, so ideally one needs to provide > the best values for all those possible places. > No, CMaps help for simple scripts only. Let's imagine a person name written বৌমিক in the Bengali script and transliterated as Bowmik. OW is a two part matra (dependent vowel) which looks as e-matra preceding the consonant and o-matra following the consonant. I-matra always precedes the consonant thus using a CMap only the word would become eboimak with two spelling errors. An editor will complain on an e-matra at the beginning of a word and i-matra following o-matra, the editor will indicate missing consonants. Similarly Hindi word स्थापित (sthaapit) would be extraxted as sthaaipat which is wrong because i-matra must not follow aa-matra. If I had time, I could give you several thousands examples where CMaps fail. In past I did many tests with Devanagari and without ActualText the problem cannot be solved. This is the very reason why \XeTeXgenerateactualtext was implemented. It is not just a problem of save as text/rtf/doc, in addition search does not work.
> And text in a PDF may by WCAG definition be non-textual content. > > > Presumably you mean, adding descriptive text to graphics that convey > meaningful information; e.g. a company logo, and most illustrations. > Of course this should be done too. But this can only be useful if the > alternate descriptive text can be found via the structure tagging; hence the > need for fully tagged PDF, navigable via that tagging. > > And Zdenek's comment emphasises how what might work well in one language > setting can be quite insufficient for others. We need to be able to > accommodate all things that are helpful. > That is surely what the U (for Universal) means in PDF/UA. > > > Cheers, > > Ross > Zdeněk Wagner http://ttsm.icpf.cas.cz/team/wagner.shtml http://icebearsoft.euweb.cz > > > > On Sunday, 17 March 2019, Ross Moore <ross.mo...@mq.edu.au> wrote: >> >> Hi Karljūrgen, >> >> On 17/03/2019, at 1:42, "Karljürgen Feuerherm" <kfeuerh...@kfeuerherm.ca> >> wrote: >> >> > Ross, >> > >> > Your reply caught my eye, and I am now looking at the pdfx package >> > documentation. >> > >> > May I ask, if accessibility is a concern, why a-2b/-2u rather than -ua-1, >> > which seems directly targeted at this? >> >> PDF/UA and PDF/A-1a,2a,3a require a fully tagged PDF. >> This is a highly non-trivial task, which requires adding much extra to the >> document, done almost entirely through \special commands. The pdfx package >> does not provide this, but is useful for meeting the Metadata and other >> requirements of these formats. >> >> Abstractly, accessibility is about having sufficient information stored in >> the PDF for software tools to be able to build and present a description of >> the content and structure, other than the visual one. The same can be said >> of software for converting into a different format. >> >> A significant part of this is being able to correctly identify each >> character in the fonts used within the TeX/produced PDF. Even this is a >> non-trivial problem, due to TeX's non-standard font encodings, and virtual >> font technique. >> >> > >> > Many thanks, >> > >> > K >> > >> >> You should use the pdfx package and prepare for PDF/A-2b or -2u. >> >> This fixes many of these things that affect conversions, as well as >> >> Accessibility and Archivability. >> >> >> >> It's not fully tagged PDF, but handles many other technical issues. >> >> >> >> >> Hope this helps. >> >> Ross >> > > > -- > Andrew Cunningham > lang.supp...@gmail.com > > >