> On 22 Oct 2015, at 3:31 , Sabine Manaa <manaa.sab...@gmail.com> wrote:
> 
> I do also use two different implementations for artefact/pdf and html:
> 
> artefact:
> 128 asCharacter asString
> 
> html:
> '€'
> 
> same would be great

https://www.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf 
<https://www.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf>
4.9 character
"character numeric code representing an abstract symbol according to some 
defined character encoding rule
NOTE 1 There are three manifestations of characters in PDF, depending on 
context:
• A PDF file is represented as a sequence of 8-bit bytes, some of which are 
interpreted as character codes in the ASCII character set and some of which are 
treated as arbitrary binary data depending upon the context.
• The contents (data) of a string or stream object in some contexts are 
interpreted as character codes in the PDFDocEncoding or UTF-16 character set.
• The contents of a string within a PDF content stream in some situations are 
interpreted as character codes that select glyphs to be drawn on the page 
according to a character encoding that is associated with the text font. "

What those contexts are, I don't know, but they all need to be handled 
differently;
- For bullet one, there's nothing to do.
- For bullet 2, there needs to be an encoding layer which converts the strings 
to proper format when writing the PDF, see section 7.9.2.
Seems to me the process would be simpler when writing the file if one ignored 
PDFDocEncoding altogether and eiter write ascii, or convert to BOM-marked UTF16 
(in the same way we write ASCII or BOM-marked UTF8 for chunk files)
- For bullet 3, one would need to convert to the fonts character set.

Cheers
Henry

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to