Here's an excerpt the CMAP table of that font, to be found at
Root/Pages/Kids/[0]/Resources/Font/F480/ToUnicode :
/CIDInit /ProcSet findresource begin
12 dict begin
begincmap
/CIDSystemInfo 3 dict dup begin
/Registry (Adobe) def
/Ordering (UCS) def
/Supplement 0 def
end def
/CMapName /Adobe-Identity-UCS def
/CMapType 2 def
1 begincodespacerange
<0000> <FFFF>
endcodespacerange
1 beginbfchar
<0000> <ffff>
endbfchar
2 beginbfrange
<0001> <005f> <f020>
<0060> <00d0> <f080>
endbfrange
endcmap
CMapName currentdict /CMap defineresource pop
end
end
This means that characters in the content stream whole value is between
0001 and 00d0 are converted to unicode starting with f020 (see
beginbfrange - search for this word in the PDF 32000 specifiation).
https://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
But the content stream has also
[ (\000\000) ] TJ
16 times. This is being rendered as a square by Adobe and PDFBox. In the
beginbfchar section, the 0000 is being converted to unicode ffff, this
is the unicode non character. This becomes EF BF BF in utf8.
http://www.fileformat.info/info/unicode/char/ffff/index.htm
QED
Tilman
Am 23.06.2016 um 10:33 schrieb OYEBISI, Daniel:
You can get the PDF file through this url
http://www.pdf-archive.com/2016/06/23/modele-tableau-wingdings-3/
-----Message d'origine-----
De : Tilman Hausherr [mailto:[email protected]]
Envoyé : mercredi 22 juin 2016 20:03
À : [email protected]
Objet : Re: Empty glyphs
From what I see, the "whitespace" are EF BF BF which is not a valid
UTF8 character. Please upload the PDF file somewhere.
Tilman
Am 22.06.2016 um 18:39 schrieb OYEBISI, Daniel:
The problem is with some of the whitespace that appears empty in Notepad but
are really not.
Please try opening the text file with other text editors.
Thanks
-----Message d'origine-----
De : Tilman Hausherr [mailto:[email protected]] Envoyé : mercredi
22 juin 2016 17:54 À : [email protected] Objet : Re: Empty
glyphs
Your PDF didn't get through (security) but this sounds like a N++ problem.
I could display your txt file with the normal notepad, by changing the font to
windings.
Tilman
Am 22.06.2016 um 16:58 schrieb OYEBISI, Daniel:
Hello,
I came across an issue while trying to extract the text using
PDFTextStripper from the PDF file attached to this email.
When I open the txt document generated in the Notepad, it appears
normal but when I open it with Notepad++ and it gives an interesting
result.
Please can you have a look at this?
Thanks
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected].
org
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]