Get the code from
https://gist.github.com/cdokolas/8845724f8f4c0335dadfbc6f0c6afe0b
There is also a resulting PDF having a "ToUnicode" object that has the
"different" codepoints, but I don't know how to send it to you.
Note that the font I'm using is Noto Sans CJK ("NotoSansCJKsc-Regular"),
loaded from a resource.

Thanks in advance,
Constantine

--
There is a computer disease that anybody who works with computers knows
about. It's a very serious disease and it interferes completely with the
work. The trouble with computers is that you 'play' with them!
- Richard P. Feynman


On Wed, Nov 18, 2020 at 3:58 PM [email protected] <
[email protected]> wrote:

>
> Am Mittwoch, den 18.11.2020, 13:58 +0200 schrieb Constantine Dokolas:
> > I noticed that writing some codepoints to a PDF and then reading back
> > the
> > text from the generated PDF (via PDFTextStripper), I see some
> > conversions
> > happening. For example, the simple hyphen character (0x2D, "HYPHEN-
> > MINUS")
> > gets converted to a non-breaking hyphen (0x2011, "NON-BREAKING
> > HYPHEN").
> >
> > Since I'm writing unit tests to verify that everything gets written
> > correctly in the PDF from my end (PDF generation), I need to know
> > why, when
> > and how these conversions take place (I first noticed them while
> > writing
> > some CJK codepoints). Any suggestions/pointers?
> >
>
> Could you share a code snippet how you are writing/retrieving the data.
>
> BR
> Maruan
>
> > Constantine
> > --
> > There is a computer disease that anybody who works with computers
> > knows
> > about. It's a very serious disease and it interferes completely with
> > the
> > work. The trouble with computers is that you 'play' with them!
> > - Richard P. Feynman
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to