Public bug reported:

In Ubuntu 10.04 (GPL Ghostscript 8.71) cups-pdf delivers searchable PDFs, 
containing text that can neatly be copied and pasted.
    
In Ubuntu 11.10 (GPL Ghostscript 9.04) this feature is gone. PDFs cannot be 
searched, and a copy/paste produces garbage text.
    
I experimented with capturing PostScript files from a printer with a very 
simple CUPS Backend, simply streaming stdin to a PostScript file. I installed 
and shared this printer on 11.10 and used it on 10.04.
  
I used `ps2pdf` on 11.10 to generate the PDF files (just like cups-pdf does).
    
I merged the sample PDFs from the two Ubuntu versions in one PDF file and 
executed `fontspdf` to see what fonts were present. This is the result: 
     
    name                                 type              emb sub uni object ID
    ------------------------------------ ----------------- --- --- --- ---------
    AGTLMZ+Webdings                      TrueType          yes yes yes     71  0
    GYTYEM+DejaVuSerif                   TrueType          yes yes yes     72  0
    ZXOPYM+Verdana                       TrueType          yes yes yes     73  0
    PMFCNH+Verdana-Bold                  TrueType          yes yes yes     74  0
    KPEIOE+WenQuanYiZenHei               TrueType          yes yes yes     90  0
    KIIHFC+DejaVuSerif                   TrueType          yes yes yes     91  0
    BPBDFC+UnBatang-Identity-H           CID TrueType      yes yes no      93  0
    YTZIYS+Ume-P-Gothic-C4-Identity-H    CID TrueType      yes yes no      94  0
    HYRHDD+DejaVuSansMono-Identity-H     CID TrueType      yes yes no      95  0

The searchable part of the PDF (originating from the 10.04 PostScript) leans on 
the embedded 'TrueType' fonts which have unicode encoding.
    
On the other hand, the Ubuntu 11.10 PostScript is reponsible for the  'CID 
TrueType' font embedding. The absence of any 'unicode' encoding here  seems 
responsible for this part not being searchable.

What happened between 10.04 and 11.10?

Using `ps2pdf` on 10.04 with the PostScript file produced on 11.10, also 
produces a non-searchable PDF.
    
Did the printing process on 11.10 change, producing a different PostScript 
format? A format that `ps2pdf` can't handle?
    
Is this a bug? 
    
P.S. I printed from different applications (gedit, geany, FireFox,  Libre 
Office), all with the same result.

** Affects: ghostscript (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/942866

Title:
  Ubuntu 11.10: printing to PDF produces unsearchable PDF (contrary to
  10.04)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ghostscript/+bug/942866/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to