Hello,

I have Ghostscript files with a table of contents (toc) and I would like to use 
this info to generate a human-readable toc. The problem is: I can't get the 
(nested) hierarchy right.

import re

toc = """\
[ /PageMode /UseOutlines
  /Page 1
  /View [/XYZ null null 0]
  /DOCVIEW pdfmark
[ /Title (Title page)
  /Page 1
  /View [/XYZ null null 0]
  /OUT pdfmark
[ /Title (Document information)
  /Page 2
  /View [/XYZ null null 0]
  /OUT pdfmark
[ /Title (Blah)
  /Page 3
  /View [/XYZ null null 0]
  /OUT pdfmark
[ /Title (Appendix)
  /Page 16
  /Count 4
  /View [/XYZ null null 0]
  /OUT pdfmark
    [ /Title (Sub1)
      /Page 17
      /Count 4
      /OUT pdfmark
    [ /Title (Subsub1)
      /Page 17
      /OUT pdfmark
    [ /Title (Subsub2)
      /Page 18
      /OUT pdfmark
    [ /Title (Subsub3)
      /Page 29
      /OUT pdfmark
    [ /Title (Subsub4)
      /Page 37
      /OUT pdfmark
    [ /Title (Sub2)
      /Page 40
      /OUT pdfmark
    [ /Title (Sub3)
      /Page 49
      /OUT pdfmark
    [ /Title (Sub4)
      /Page 56
      /OUT pdfmark
"""    
print('\r\n** Table of contents\r\n')
pattern = '/Title \((.+?)\).+?/Page ([0-9]+)(?:\s+/Count ([0-9]+))?'
indent = 0
start = True
for title, page, count in re.findall(pattern, toc, re.DOTALL):
    title = (indent * ' ') + title
    count = int(count or 0)
    print(title.ljust(79, ".") + page.zfill(2))
    if count:
        count -= 1
        start = True
    if count and start:
        indent += 2
        start = False
    if not count and not start:
        indent -= 2
        start = True

This generates the following TOC, with subsub2 to subsub4 dedented one level 
too much:


** Table of contents

Title 
page.....................................................................01
Document 
information...........................................................02
Blah...........................................................................03
Appendix.......................................................................16
  
Sub1.........................................................................17
    
Subsub1....................................................................17
  
Subsub2......................................................................18
  
Subsub3......................................................................29
  
Subsub4......................................................................37
  
Sub2.........................................................................40
  
Sub3.........................................................................49
  
Sub4.........................................................................56

What is the best approach to do this?

Thanks in advance!

Albert-Jan
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to