Re: [Tutor] need help generating table of contents

Peter Otten Mon, 27 Aug 2018 11:46:59 -0700

Albert-Jan Roskam wrote:

> 
> From: Tutor <tutor-bounces+sjeik_appie=hotmail....@python.org> on behalf
> of Peter Otten <__pete...@web.de> Sent: Friday, August 24, 2018 3:55 PM
> To: tutor@python.org
> <snip>
>> The following reshuffle of your code seems to work:
>> 
>> print('\r\n** Table of contents\r\n')
>> pattern = '/Title \((.+?)\).+?/Page ([0-9]+)(?:\s+/Count ([0-9]+))?'
>> 
>> def process(triples, limit=None, indent=0):
>> for index, (title, page, count) in enumerate(triples, 1):
>> title = indent * 4 * ' ' + title
>> print(title.ljust(79, ".") + page.zfill(2))
>> if count:
>> process(triples, limit=int(count), indent=indent+1)
>> if limit is not None and limit == index:
>>  break
>> 
>> process(iter(re.findall(pattern, toc, re.DOTALL)))
> 
> Hi Peter, Cameron,
> 
> Thanks for your replies! The code above indeeed works as intended, but: I
> don't really understand *why*. I would assign a name to the following line
> "if limit is not None and limit == index", what would be the most
> descriptive name? I often use "is_*" names for boolean variables. Would
> "is_deepest_nesting_level" be a good name?


No, it's not necessarily the deepest level. Every subsection eventually ends 
at this point; so you might call it

reached_end_of_current_section

Or just 'limit' ;) 

The None is only there for the outermost level where no /Count is provided. 
In this case the loop is exhausted.

If you find it is easier to understand you can calculate the outer count aka 
limit as the number of matches - sum of counts:

def process(triples, section_length, indent=0):
    for index, (title, page, count) in enumerate(triples, 1):
        title = indent * 4 * ' ' + title
        print(title.ljust(79, ".") + page.zfill(2))
        if count:
            process(triples, section_length=int(count), indent=indent+1)
        if section_length == index:
            break

triples = re.findall(pattern, toc, re.DOTALL)
toplevel_section_length = (
    len(triples)
    - sum(int(c or 0) for t, p, c in triples)
)
process(iter(triples), toplevel_section_length)

Just for fun here's one last variant that does away with the break -- and 
thus the naming issue -- completely:

def process(triples, limit=None, indent=0):
    for title, page, count in itertools.islice(triples, limit):
        title = indent * 4 * ' ' + title
        print(title.ljust(79, ".") + page.zfill(2))
        if count:
            process(triples, limit=int(count), indent=indent+1)

Note that islice(items, None) does the right thing:

>>> list(islice("abc", None))
['a', 'b', 'c']


> Also, I don't understand why iter() is required here, and why finditer()
> is not an alternative.

finditer() would actually work -- I didn't use it because I wanted to make 
as few changes as possible to your code. What does not work is a list like 
the result of findall(). This is because the inner for loops (i. e. the ones 
in the nested calls of process) are supposed to continue the iteration 
instead of restarting it. A simple example to illustrate the difference:

 >>> s = "abcdefg"
>>> for k in range(3):
...     print("===", k, "===")
...     for i, v in enumerate(s):
...         print(v)
...         if i == 2: break
... 
=== 0 ===
a
b
c
=== 1 ===
a
b
c
=== 2 ===
a
b
c
>>> s = iter("abcdefg")
>>> for k in range(3):
...     print("===", k, "===")
...     for i, v in enumerate(s):
...         print(v)
...         if i == 2: break
... 
=== 0 ===
a
b
c
=== 1 ===
d
e
f
=== 2 ===
g




_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] need help generating table of contents

Reply via email to