On 8 jun, 15:19, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote: > In <[EMAIL PROTECTED]>, Diez B. Roggisch wrote: > > > > > jvdb schrieb: > >> True. But there is another issue attached to the one i wrote. > >> When i know how much this occurs, i know the amount of pages in the > >> file. After that i would like to be able to extract a given amount of > >> data: > >> file x contains 20 <0C>. then for example i would like to extract from > >> instance 5 to instance 12 from the file. > >> The reason why i want to do this: The 0C stands for a pagebreak in PCL > >> language. This way i would be absle to extract a certain amount of > >> pages from the file. > > > And? Finding the respective indices by using > > > last_needle_position = 0 > > positions = [] > > while last_needle_position != -1: > > last_needle_position = contents.find(needle, last_needle_position+1) > > if last_needle_position != -1: > > positions.append(last_needle_position) > > > will find all the pagepbreaks. then just slice contents appropriatly. > > Did you read the python tutorial? > > Maybe splitting at '\x0c', selecting/slicing the wanted pages and joining > them again is enough, depending of the size of the files and memory of > course. > > One problem I see is that '\x0c' may not always be the page end. It may > occur in "rastered image" data too I guess. > > Ciao, > Marc 'BlackJack' Rintsch
Hi, your last comment is also something i have noticed. There are a number of occasions where this will happen. I also have to deal with this. I will dive into this on monday, after this hot weekend. cheers, Jeroen -- http://mail.python.org/mailman/listinfo/python-list