Re: extract PDF pages
Aloha, David Isaac wrote: > I am looking for a Python solution. > Just for PDF page extraction. > Any hope? With python, there's always hope. http://sourceforge.net/projects/pdfplayground In the CVS (sorry no distribution at the time) you'll find an example page-extract. http://cvs.sourceforge.net/viewcvs.py/pdfplayground/ppg/Exp/page-extract.py?rev=1.1&view=markup pdfplayground is limited at the moment to PDF <= 1.4. If you want to do more with .pdfs you'll probably need at least a basic understanding of the PDF specification. pdfplayground is focused at low-level .pdf (by implementation resources...). Thomas Lotze is also preparing a pdf reader/writer project: http://svn.thomas-lotze.de/PDFSpec/ So is David Boddie: http://www.boddie.org.uk/david/Projects/Python/pdftools Wishing a happy day LOBI -- http://mail.python.org/mailman/listinfo/python-list
Re: Addressing the last element of a list
Aloha, [EMAIL PROTECTED] wrote: > Isn't there an easier way than > lst[len(lst) - 1] = ... lst[-1] = ... Wishing a happy day LOBI -- http://mail.python.org/mailman/listinfo/python-list
Re: Obtaining glyph width in Python
Aloha, Charlie wrote: > Hi, I'm looking for a way to obtain the width of a string, either in actual > inches/centimeters, or pixels will also work. Unfortunately this seems > difficult as I'd like to keep things as close to the stock Python install as > possible, and I'm not working with Graphics or X at all. So you need both: metrics for single characters/glyphs and con- catenated glyphs and words. > PIL = Huge for only using one function. I'm not working with any graphics. > PyFT = Everyone uses FreeType2 now, and PyFT seems dead anyhow. > PyFT2 = Does not exist. > tkinter.text() = Works with X, creates windows no matter what you do. > t1lib = Separate package, no TTF support. > t1python = Same thing as t1lib? For the glyph metrics and informations there is the ttx/fonttools project on sourceforge available. Afiar fonttools only need a Numeric installation. > Ultimately, I'm looking to take a stream of text, and break it up into lines > based on page width... and I need to know how wide (and ultimately how tall, > for page breaks) the individual glyphs are so I can break properly. If > there's > an easier way to do this than calculating individual glyph width, I'm open to > that too. It looks like a little bit that you're redeveloping TeX (in python)... > I was really just looking to see if there was anything out there that wasn't > too large or too obscure/dated. Maybe there's something lower level that > could > be done to achieve this? Is there metadata in the font that holds this > information that could be extracted? Actually there is not only meta but real data included in the font, speaking of Type1, TrueType and OpenType scalable outline fonts. Wishing a happy day LOBI -- http://mail.python.org/mailman/listinfo/python-list
Re: Frankenstring
Aloha, Thomas Lotze wrote: > I think I need an iterator over a string of characters pulling them out > one by one, like a usual iterator over a str does. At the same time the > thing should allow seeking and telling like a file-like object: f = frankenstring("0123456789") for c in f: > ... print c > ... if c == "2": > ... break > ... > 0 > 1 > 2 f.tell() > 3L f.seek(7) for c in f: > ... print c > ... > 7 > 8 > 9 > I can think of more than one clumsy way to implement the desired > behaviour in Python; I'd rather like to know whether there's an > implementation somewhere that does it fast. (Yes, it's me and speed > considerations again; this is for a tokenizer at the core of a library, > and I'd really like it to be fast.) You can already think my answer, because i'm doing this at the core of a similar library, but to give others the chance to discuss. >>> f = "0123456789" >>> p = 0 >>> t2 = f.find('2')+1 >>> for c in f[p:t2]: ... print c ... 0 1 2 >>> p = 7 >>> for c in f[p:]: ... print c ... 7 8 9 A string, and a pointer on that string. If you give up the boundary condition to tell backwards, you can start to eat up the string via f = f[p:]. There was a performance difference with that, in fact it was faster ~4% on a python2.2. I dont't expect any iterator solution to be faster than that. Wishing a happy day LOBI -- http://mail.python.org/mailman/listinfo/python-list
Re: Frankenstring
Aloha, Thomas Lotze wrote: >>A string, and a pointer on that string. If you give up the boundary >>condition to tell backwards, you can start to eat up the string via f = >>f[p:]. There was a performance difference with that, in fact it was faster >>~4% on a python2.2. > When I tried it just now, it was the other way around. Eating up the > string was slower, which makes sense to me since it involves creating new > string objects all the time. I expected the f[p:] also to be slower, the 4% i only measured on one platform. Most propably the CG and memory management isn't the same. >>I dont't expect any iterator solution to be faster than that. > It's not so much an issue of iterators, but handling Python objects > for every char. Iterators would actually be quite helpful for searching: I > wonder why there doesn't seem to be an str.iterfind or str.itersplit > thing. And I wonder whether there shouldn't be str.findany and > str.iterfindany, which takes a sequence as an argument and returns the > next match on any element of it. There is a finditer in the re. I'm currently rewriting a few pattern matching things and find it quite valueable. >>> import re >>> pat = re.compile('[57]') >>> f = "754356184756046104564" >>> for a in pat.finditer(f): ... print a.start(),f[a.start()] ... 0 7 1 5 4 5 9 7 10 5 18 5 Wishing a happy day LOBI -- http://mail.python.org/mailman/listinfo/python-list
using hotshot for timing and coverage analysis
Aloha, hotshot.Profile has flags for recording timing per line and line events. Even if i had both set to 1 i still get only the standard data (time per call). Is there any document available that has examples how to use the hotshot for converage analysis and to display timing per line? Hoping for an answer and wishing a happy day LOBI -- http://mail.python.org/mailman/listinfo/python-list
Re: encryption with python
Aloha, [EMAIL PROTECTED] wrote: > I was wondering if someone can recommend a good encryption algorithm > written in python. > It would be great if there exists a library already written to do this, > and if there is, can somebody please point me to it?? M2Crypto, interface to OpenSSL http://sandbox.rulemaker.net/ngps/m2 Wishing a happy day LOBI -- http://mail.python.org/mailman/listinfo/python-list
zlib written in python
Aloha, is a pure _python_ implementation of the zlib available? I have broken zlib streams and need to patch the deocder to get them back. Wishing a happy day LOBI -- http://mail.python.org/mailman/listinfo/python-list
Re: searching pdf files for certain info
Aloha, rbt wrote: Not really a Python question... but here goes: Is there a way to read the content of a PDF file and decode it with Python? I'd like to read PDF's, decode them, and then search the data for certain strings. First of all, http://groups.google.de/groups?selm=400CF2E3.29506EAE%40netsurf.de&output=gplain still applies here. If you can deal with a very basic implementation of a pdf-lib you might be interested in http://sourceforge.net/projects/pdfplayground In the CVS (or the current snapshot) you can find in ppg/Doc/text_extract.txt an example for text extraction. >>> import pdffile >>> import pages >>> import zlib >>> pf = pdffile.pdffile('../pdf-testset1/a.pdf') >>> pp = pages.pages(pf) >>> c = zlib.decompress(pf[pp.pagelist[0]['/Contents']].stream) >>> op = pdftool.parse_content(c) >>> sop = [x[1] for x in op if x[0] in ["'", "Tj"]] >>> for a in sop: print a[0] Wishing a happy day LOBI -- http://mail.python.org/mailman/listinfo/python-list
Re: searching pdf files for certain info
Aloha, rbt wrote: Thanks guys... what if I convert it to PS via printing it to a file or something? Would that make it easier to work with? Not really... The classical PS Drivers (f.e. Acroread4-Unix print-> ps) simply define the pdf graphics and text operators as PS commands and copy the pdf content directly. Wishing a happy day LOBI -- http://mail.python.org/mailman/listinfo/python-list
Re: PDF count pages
Aloha, Jose Benito Gonzalez Lopez wrote: Does anyone know how I could do in order to get/count the number of pages of a PDF file? Like this ? Python 2.2.2 (#3, Apr 10 2003, 17:06:52) [GCC 2.95.2 19991024 (release)] on sunos5 Type "help", "copyright", "credits" or "license" for more information. >>> import pdffile >>> pf = pdffile.pdffile('../rfc1950.pdf') >>> import pages >>> pp = pages.pages(pf) >>> len(pp.pagelist) 10 >>> This is an example of the usage of pdfplayground. pdfplayground is available via sourceforge. There is no package at the moment, but you should be able to check out via anon-cvs. Wishing a happy day LOBI -- http://mail.python.org/mailman/listinfo/python-list
OT: Re: PDF count pages
Aloha, [EMAIL PROTECTED] wrote: Andreas Lobinger wrote: >>> import pdffile I browsed the code in CVS and it looks like a pretty comprehensive implementation. Maybe we should join forces. I have problems contacting you via the given e-mail adress. Wishing a happy day LOBI -- http://mail.python.org/mailman/listinfo/python-list
Re: SVG rendering with Python
Aloha, richard wrote: > Dennis Benzinger wrote: >>Does anybody know of a SVG rendering library for Python? > Google "python svg" ... to find what? Whishing a happy day LOBI -- http://mail.python.org/mailman/listinfo/python-list
Re: Writing a bytecode interpreter (for TeX dvi files)
Aloha, Jonathan Fine wrote: > I'm writing some routines for handling dvi files. > In case you didn't know, these are TeX's typeset output. > These are binary files containing opcodes. > I wish to write one or more dvi opcode interpreters. > Are there any tools or good examples to follow for > writing a bytecode interpreter? As far as i know, dvi is a very straight forward format, commands followed by parameters, no conditionals, no loops. For similar designs i used something like the following approach: s = file('a.dvi','r').read() # read complete file to string while s: command = ord(s[0]) if command < 128: #typeset command s = s[1:] elif command = 139: #bop command param = s[:40] #interpret param c = struct.unpack('D',param[:3]) #consume s s = s[41:] else: #undefined command s = s[1:] You can work directly on strings, or convert to a list. If you don't want long if/elif lists, you can use a dict as a dispatcher (python cookbook has an example?). For most of the commands you can use a lookup table for the parameter list length. TeX §591 claims, that dvi is stricly interpretable from front to end. The description in Tex§585++ can be transcripted to struct definitions easily. Wishing a happy day LOBI -- http://mail.python.org/mailman/listinfo/python-list
expat error, help to debug?
Aloha, i'm trying to write an xml filter, that extracts some info about an .xml document (with external entities), esp. start elements and external entities. The document is a DOCBOOK xml and afacs well formed and passes our docbook toolchain (dblatex etc.). My parser is (very simple): [115] scylla(scylla)> more pbxml.py class xmlhandle: def __init__(self): self.parser_stack = []; self.parser = None; def se(self,name,attr): print "s", self.parser.CurrentLineNumber, name, attr def ex(self,context,baseid,n1,n2): print "x",context,n1,n2 def fromxml(fname): import xml.parsers.expat p = xml.parsers.expat.ParserCreate() xl = xmlhandle() p.StartElementHandler = xl.se p.ExternalEntityRefHandler = xl.ex xl.parser = p p.ParseFile(file(fname)) return if __name__ == "__main__": import sys fromxml(sys.argv[1]) my document (in 2 parts): [116] scylla(scylla)> more s3.xml ]> &bookinfo; technical description This chapter includes specification of the main simulation loop. [118] scylla(scylla)> more bookinfo.xml BookTitle A B The run produces: [120] scylla(scylla)> python pbxml.py s3.xml s 7 book {} x bookinfo bookinfo.xml None s 9 chapter {u'id': u'technicalDescription'} s 9 title {} s 10 para {} Traceback (most recent call last): File "pbxml.py", line 25, in ? fromxml(sys.argv[1]) File "pbxml.py", line 20, in fromxml p.ParseFile(file(fname)) TypeError: an integer is required Anyone any idea where the error is produced? Anyone any idea how to debug(? if it's really a bug or missunderstanding of expate) this? Hoping for an answer and wishing a happy day, LOBI -- http://mail.python.org/mailman/listinfo/python-list
Re: expat error, help to debug?
Aloha, Lawrence D'Oliveiro wrote: > In message <[EMAIL PROTECTED]>, Andreas Lobinger wrote: >>Anyone any idea where the error is produced? > Do you want to try adding an EndElementHandler as well, just to get more > information on where the error might be happening? I want. Adding an EndElement (left as an exercise to the user) handler the output looks like this: [42] scylla(scylla)> python pbxml.py s3.xml s 7 book {} x bookinfo bookinfo.xml None s 9 chapter {u'id': u'technicalDescription'} s 9 title {} e title s 10 para {} e para e chapter e book Traceback (most recent call last): File "pbxml.py", line 29, in ? fromxml(sys.argv[1]) File "pbxml.py", line 24, in fromxml p.ParseFile(file(fname)) TypeError: an integer is required which shows me that the error is caused after parsing the /book ... BUT still within p.ParseFile (expat internal), so i can't look into it. The example here may be missleading. It was stripped down from a quite large docbook.xml and there ther error happened in the middle of the document, not at the end. Wishing a happy day, LOBI -- http://mail.python.org/mailman/listinfo/python-list
Re: expat error, help to debug?
Aloha, Andreas Lobinger wrote: > Lawrence D'Oliveiro wrote: >> In message <[EMAIL PROTECTED]>, Andreas Lobinger wrote: >>> Anyone any idea where the error is produced? ... to share my findings with you: def ex(self,context,baseid,n1,n2): print "x",context,n1,n2 return 1 The registered Handler has to return a (integer) value. Would have been nice if this had been mentioned in the documentation. Wishing a happy day, LOBI -- http://mail.python.org/mailman/listinfo/python-list
Re: expat error, help to debug?
Aloha, Andreas Lobinger wrote: > Andreas Lobinger wrote: >> Lawrence D'Oliveiro wrote: >>> In message <[EMAIL PROTECTED]>, Andreas Lobinger wrote: >>>> Anyone any idea where the error is produced? > The registered Handler has to return a (integer) value. > Would have been nice if this had been mentioned in the documentation. Delete last line, it is mentioned in the documentation. -- http://mail.python.org/mailman/listinfo/python-list
Re: count pages in a pdf
Tim Golden wrote: > [EMAIL PROTECTED] wrote: > >> is it possible to parse a pdf file in python? for starters, i would >> like to count the number of pages in a pdf file. i see there is a >> project called ReportLab, but it seems to be a pdf generator... i >> can't tell if i would be able to parse a pdf file programmically. http://groups.google.de/group/comp.lang.python/msg/6f304970b4ff40ce and following. > Well the simple expedient of putting "python count pages pdf" into > Google turned up the following link: > > http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/496837 h. There is a non-vanishing possibility that this pattern- matching can give you false positives -> not reliable. Wishing a happy day, LOBI -- http://mail.python.org/mailman/listinfo/python-list