Hello, I have Ghostscript files with a table of contents (toc) and I would like to use this info to generate a human-readable toc. The problem is: I can't get the (nested) hierarchy right.
import re toc = """\ [ /PageMode /UseOutlines /Page 1 /View [/XYZ null null 0] /DOCVIEW pdfmark [ /Title (Title page) /Page 1 /View [/XYZ null null 0] /OUT pdfmark [ /Title (Document information) /Page 2 /View [/XYZ null null 0] /OUT pdfmark [ /Title (Blah) /Page 3 /View [/XYZ null null 0] /OUT pdfmark [ /Title (Appendix) /Page 16 /Count 4 /View [/XYZ null null 0] /OUT pdfmark [ /Title (Sub1) /Page 17 /Count 4 /OUT pdfmark [ /Title (Subsub1) /Page 17 /OUT pdfmark [ /Title (Subsub2) /Page 18 /OUT pdfmark [ /Title (Subsub3) /Page 29 /OUT pdfmark [ /Title (Subsub4) /Page 37 /OUT pdfmark [ /Title (Sub2) /Page 40 /OUT pdfmark [ /Title (Sub3) /Page 49 /OUT pdfmark [ /Title (Sub4) /Page 56 /OUT pdfmark """ print('\r\n** Table of contents\r\n') pattern = '/Title \((.+?)\).+?/Page ([0-9]+)(?:\s+/Count ([0-9]+))?' indent = 0 start = True for title, page, count in re.findall(pattern, toc, re.DOTALL): title = (indent * ' ') + title count = int(count or 0) print(title.ljust(79, ".") + page.zfill(2)) if count: count -= 1 start = True if count and start: indent += 2 start = False if not count and not start: indent -= 2 start = True This generates the following TOC, with subsub2 to subsub4 dedented one level too much: ** Table of contents Title page.....................................................................01 Document information...........................................................02 Blah...........................................................................03 Appendix.......................................................................16 Sub1.........................................................................17 Subsub1....................................................................17 Subsub2......................................................................18 Subsub3......................................................................29 Subsub4......................................................................37 Sub2.........................................................................40 Sub3.........................................................................49 Sub4.........................................................................56 What is the best approach to do this? Thanks in advance! Albert-Jan _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor