Debugging memory leaks
Hi all, I've written a program using Twisted that uses SqlAlchemy to access a database using threads.deferToThread(...) and SqlAlchemy's scoped_session(...). This program runs for a long time, but leaks memory slowly to the point of needing to be restarted. I don't know that the SqlAlchemy/threads thing is the problem, but thought I'd make you aware of it. Anyway, my real question is how to go about debugging memory leak problems in Python, particularly for a long running server process written with Twisted. I'm not sure how to use heapy or guppy, and objgraph doesn't tell me enough to locate the problem. If anyone as any suggestions or pointers it would be very much appreciated! Thanks in advance, Doug -- http://mail.python.org/mailman/listinfo/python-list
Re: Debugging memory leaks
Dieter, Thanks for the response, and you're correct, debugging memory leaks is tough! So far I haven't had much luck other than determining I have a leak. I've used objgraph to see that objects are being created that don't seem to get cleaned up. What I can't figure out so far is why, they are local variable objects that "should" get cleaned up when they go out scope. Ah well, I'll keep pushing! Thanks again, Doug -- http://mail.python.org/mailman/listinfo/python-list
Twisted and argparse
Hi all, I'm trying to write an Twisted program that uses the Application object (and will run with twistd) and I'd like to parse command line arguments. The Twisted documentation shows how to use a Twisted thing called usage.Options. However to me this looks a lot like the older Python module getopts. Is there a way to use the argparse module with a Twisted Application? Thanks! Doug -- http://mail.python.org/mailman/listinfo/python-list
xmlrpclib question
Hi, I tried the xmlrpclib examples from the Python Cookbook and had a problem. The example works fine so long as the server and client are on the same machine. But as soon as I try to run the client from another machine (all linux machines on the same network) I get a socket.error 111, connection refused. I've tried some different things to get past this, different ports, put machines in /etc/hosts, but to no avail. Does anyone have any suggestions about what I'm doing wrong? Is there something I have to enable external access to the server? Thanks in advance, Doug Farrell -- http://mail.python.org/mailman/listinfo/python-list
Re: xmlrpclib question
Thanks for the responses, you were both on the right track, I just didn't provide enough of the right information. I solved the problem by changing "localhost" in the server code to actually contain the name of the machine, the same as it appears in our DNS. This enabled the client to connect to the server immediately. It's not clear in the example code that this is necessary, but it makes sense that the server wouldn't be able to 'listen' to anything on "localhost" other than same machine applications because they would all use the lo interface. Thanks! Doug -- http://mail.python.org/mailman/listinfo/python-list
xmlrpc with Python and large datases
Hi all, I helped one of my co-workers put together an XMLRPC Python script that allowed him to get database data from remote machines. This was done because the source of the data could be Oracle on a Sun/Solaris machine, and MySQL on our linux machines. Doing the script in Python allowed him to gather the date with a general purpose API and just send over some SQL queries. Since then he's run into a problem. If he sends a query that gets a very large recordset from the database the script fails and it vaguely reports about a "broken pipe", which I'm guessing is a problem with the network connection. Has anyone else seen this use XMLRPC and Python and large amounts of data being returned via XMLRPC? And if so, is there a solution? Thanks, Doug -- http://mail.python.org/mailman/listinfo/python-list
Re: xmlrpc with Python and large datases
Duncan, Thanks for the reply. We are running this on an Apache server on the linux box, and an iPlanet4.1 server on the solaris machines. However, both these servers are strictly 'inside' the firewall. I checked the apache configuration and there is no limitrequestbody parameter in the file at all. So I'm assuming from what you've said that our configuration would be unlimited. However, I will test that by inserting limitrequestbody and setting it to 0 for unlimited to see if that changes things. Thanks, Doug -- http://mail.python.org/mailman/listinfo/python-list
Re: xmlrpc with Python and large datases
Steve, Thanks for your reply, I'll look into things based on your comments. Also, I've read your book "Python Web Programming" and wanted you to know it has helped me a lot with various python projects, thanks! Doug -- http://mail.python.org/mailman/listinfo/python-list
pagecrawling websites with Python
Hi all, We've got an application we wrote in Python called pagecrawler that generates a list of URL's based on sql queries. It then runs through this list of URL's 'browsing' one of our staging servers for all those URL's. We do this to build the site dynamically, but each page generated by the URL is saved as a static HTML file. Anyway, the pagecrawler program uses Python threads to try and build the pages as fast as it can. The list of URL's is stored in a queue and the thread objects get URL's from the queue and run them till the queue is empty. This works okay but it still seems to take a long time to build the site this way, even though the actual pages only take milliseconds to run (the pages are generated with PHP on separate server). Does anyone have any insight if this is a reasonable approach to build web pages, or if we should look at another design? Thanks in advance, Doug -- http://mail.python.org/mailman/listinfo/python-list
Re: pagecrawling websites with Python
Swaroop, Thanks for the reply, I'll take a look at HarvestMan and see if we can use it directly, or get some ideas from the source code. :) Doug -- http://mail.python.org/mailman/listinfo/python-list
Recommended hosting
Hi all, I'd like to build a web site for myself, essentially a "vanity" web site to show off whatever web development skills I have, and perhaps do some blogging. I'm a Python developer, so I'd like to develop the site with the following stack: web applications written with Python and Flask, running as uwsgi applications. These would support dynamic HTML where needed, but mostly it would provide REST API's. static content delivered by Nginx Can anyone give me some recommendations for a good hosting company that would allow me work with the above tool set? I'm US based if that makes a difference. Thanks in advance! Doug -- https://mail.python.org/mailman/listinfo/python-list
Re: Recommended hosting
Hi all, OP here, thanks for all your reply's, all very useful. I'm going to check out a couple and see what works for the project I have in mind. Thanks again! Doug -- https://mail.python.org/mailman/listinfo/python-list
Pyro stability
Hi all, At work I'm considering proposing a solution for our distributed processing system (a web based shopping cart that feeds an actual printing production line) based on Pyro. I've done some minor experiments with this and Pyro looks interesting and like a good implementation of what I want. I've got a couple of questions though: 1) Has anyone had any experience with Pyro, and if so, have you had any stability, or memory use issues running Pyro servers or nameservers on the various participating computers? (We have a mixed environment of Linux and Windows, but will be heading to an all Linux (RedHat) environment soon. 2) One of the guys I work with is more inclined to set up XMLRPC communication between the processes, and he is also leery of running daemon processes. His solution is to have essentially Python CGI code that responds to the various XMLRPC requests. Does anyone have any opinions on this? I know what mine are already. :) 3) I've considered using CORBA, which is more powerful, and certainly faster, but it's complexity to set up compared to the rather simple work I'm trying to do seems prohibative. Does anyone have any thoughts on this? Thanks in advance, Doug -- http://mail.python.org/mailman/listinfo/python-list
Re: Pyro stability
Irmen, Thanks, you're very good about answering Pyro related questions! Thanks again. I posted a more detailed question to the mailing list describing as best I could how I want to use Pyro and the questions I have in regards to the system described. Doug Irmen de Jong wrote: > writeson wrote: > [some questions about Pyro] > > I've replied to this on Pyro's mailing list. > -Irmen -- http://mail.python.org/mailman/listinfo/python-list
wxPython problems with Fedora Core 5
Hi all, I'm trying to use wxPython from a fairly new installation of Fedora Core 5. I installed wxPython using yum -y install wxPython and that all seemed to work fine. However, when I run Python and do this: import wx I get this: Traceback (most recent call last): File "", line 1, in ? File "/usr/lib/python2.4/site-packages/wx/__init__.py", line 45, in ? from wxPython import wx File "/usr/lib/python2.4/site-packages/wxPython/__init__.py", line 20, in ? import wxc ImportError: /usr/lib/libwx_gtk2-2.4.so.0: undefined symbol: pango_x_get_context Anyone have any ideas what's going on and what I can do to fix this? Thanks in advance, Doug -- http://mail.python.org/mailman/listinfo/python-list
Re: wxPython problems with Fedora Core 5
Frank, Thanks for the link, that solved the problem for me with FC5! That was a great help to me! Doug -- http://mail.python.org/mailman/listinfo/python-list
python2.5 and mysqldb
Hi all, At work we're using python2.3 and I'd like to start getting us moved up to python2.5. We run Centos4 which is the free, open source version of RedHat Enterprise. I've got python2.5 installed on this machine, but am stuck trying to get mysqldb installed and running on this machine. I've tried with easy_install and by building from the tar file and both return a long list of errors from a gcc compile. I'm not sure what to do next to resolve this issue, so if anyone could give me some guidance it would be greatly appreciated. Thanks in advance, Doug -- http://mail.python.org/mailman/listinfo/python-list
Re: Extracting images from a PDF file
On Dec 27, 1:12 am, Carl K <[EMAIL PROTECTED]> wrote: > Doug Farrell wrote: > > Hi all, > > > Does anyone know how to extract images from a PDF file? What I'm looking > > to do is use pdflib_py to open large PDF files on our Linux servers, > > then use PIL to verify image data. I want to do this in order > > to find corrupt images in the PDF files. If anyone could help > > me out, or point me in the right direction, it would be most > > appreciated! > > If you are ok shelling out to a binary: > > pdfimages - Portable Document Format (PDF) image extractor (version > 3.00)http://packages.ubuntu.com/gutsy/text/xpdf-utils > > I am trying to convert the pdf to a png, but without having to run external > commands. so I will understand if you arn't happy with pdfimages. > > Carl K Carl, Thanks for the feedback, and I don't mind shelling out to an external command if it gets the job done. Thanks for the link to xpdf-utils, I'm going to look into it this morning. Doug -- http://mail.python.org/mailman/listinfo/python-list
Re: Extracting images from a PDF file
On Dec 27, 10:13 am, writeson <[EMAIL PROTECTED]> wrote: > On Dec 27, 1:12 am, Carl K <[EMAIL PROTECTED]> wrote: > > > > > Doug Farrell wrote: > > > Hi all, > > > > Does anyone know how to extract images from aPDFfile? What I'm looking > > > to do is use pdflib_py to open largePDFfiles on our Linux servers, > > > then use PIL to verify image data. I want to do this in order > > > to find corrupt images in thePDFfiles. If anyone could help > > > me out, or point me in the right direction, it would be most > > > appreciated! > > > If you are ok shelling out to a binary: > > > pdfimages - Portable Document Format (PDF) image extractor (version > > 3.00)http://packages.ubuntu.com/gutsy/text/xpdf-utils > > > I am trying to convert thepdfto a png, but without having to run external > > commands. so I will understand if you arn't happy with pdfimages. > > > Carl K > > Carl, > > Thanks for the feedback, and I don't mind shelling out to an external > command if it gets the job done. Thanks for the link to xpdf-utils, > I'm going to look into it this morning. > > Doug Hi, Our linux servers run CentOS (4.X) I believe, and the repositories for this version doesn't have xpdf-utils available. I'm going to look into editing the sources.list file in order to get yum to install the necessary dependencies for me as xpdf-utils looks very useful! Doug -- http://mail.python.org/mailman/listinfo/python-list
Re: Extracting images from a PDF file
On Dec 27, 2:17 pm, Max Erickson <[EMAIL PROTECTED]> wrote: > Doug Farrell <[EMAIL PROTECTED]> wrote: > > Hi all, > > > Does anyone know how to extract images from aPDFfile? What I'm > > looking to do is use pdflib_py to open largePDFfiles on our > > Linux servers, then use PIL to verify image data. I want to do > > this in order to find corrupt images in thePDFfiles. If anyone > > could help me out, or point me in the right direction, it would > > be most appreciated! > > > Also, does anyone know of a way to validate aPDFfile? > > > Thanks in advance, > > Doug > > There is some discussion here: > > http://nedbatchelder.com/blog/200712.html#e20071210T064608 > > max Max, That's a very interesting snippet of code, thanks for posting the link! Much appreciated! Doug -- http://mail.python.org/mailman/listinfo/python-list
handlers.SocketHandler and exceptions
Hi all, On our Linux systems at work I've written a Twisted logging server that receives log messages from multiple servers/processes to post them to a log file, essentially serializing all the process log messages. This works well, that is until I tried this test code: try: t = 10 / 0 except Exception, e: log.exception("divide by zero") where log is the logger instance retreived from a call to getLogger(). The problem is the handlers.SocketHandler tries to cPickle.dump() the log record, which in this case contains an exc_info tuple, the last item of which is a Traceback object. The pickling fails with an "unpickleable error" and that's that. Does anyone have any ideas how to handle this situation? I'd hate to have to give up using the log.exception(...) call as it's useful to get strack trace information in the log file. Thanks in advance, Doug Farrell -- http://mail.python.org/mailman/listinfo/python-list
Re: handlers.SocketHandler and exceptions
Mark, > > Check out the traceback module. It can translate the traceback into a > variety of formats (such as a string) that can be pickled. > > --Mark Thanks for the reply. I was looking at the traceback module and thinking along the same lines you are. The problem I'm having with that is how to modify the behavior of the SocketHandler code so it would call the traceback module functions. The point at which the handlers.SocketHandler code fails is in the method makePickle(), and I'm not sure how to overload/override that method. I tried creating my own class: class MySocketHandler(handlers.SocketHandler): def makePickle(self, record): # perform new code that transforms a Traceback object into a string but so far I haven't figured out how to get the logging module to use my class. In my logging configuration file I tried something like this: [handler_local_server] class=mydirectory.MySocketHandler level=DEBUG formatter=general args=("localhost", handlers.DEFAULT_TCP_LOGGING_PORT + 1) but I can't seem to get the logging module to include mydirectory in its search path for modules. So that's where I'm stuck now. Again, thanks for your response, Doug -- http://mail.python.org/mailman/listinfo/python-list
Re: handlers.SocketHandler and exceptions
Vinay, Thanks for your reply, very interesting. We're currently running Python2.3 (though we are getting ready to move to Python2.5), so I'm guessing the code you're showing comes from Python2.5? I'm wondering if I can edit the handlers.py code in my Python2.3 installation, make the changes you show above, and have things work? Any thoughts on this? Thanks for the help!! Doug -- http://mail.python.org/mailman/listinfo/python-list
Re: handlers.SocketHandler and exceptions
On Jan 17, 2:45 pm, Vinay Sajip <[EMAIL PROTECTED]> wrote: Vinay, Again, thanks for your very timely help! I was just editing the handlers.py code, and didn't really understand how that was going to work, and of course it didn't. I was just about to write to you again, and voila, you'd already responded with what I needed to know. I would have been floundering around for quite awhile before I'd have found (if ever) the change you mentioned to __init__.py. I made the changes and my logging server is working as I expected! Exceptions are being placed in the log file, complete with their tracebacks. Again, thanks very much for your help, greatly appreciated! Doug -- http://mail.python.org/mailman/listinfo/python-list
Can I dyanmically add Pyro objects to a running Pyro server?
Hi everyone, I'm trying to build a distributed system using the code in the examples/distributed_computing2 directory of the Pyro un-tarred distribution. I'm trying to make this generic so one Pyro class can kick off another on mulit-core/multi-cpu/multi-server systems. What I'd like to know is this, after you've got the server.requestloop() running in the Pyro server, is it possible to add objects to the system? As in calling server.connect again with a new class and have the daemon provide access to that. I'm essentially trying to dynamically add Pyro objects to a running Pyro server. Thanks in advance for your help, Doug -- http://mail.python.org/mailman/listinfo/python-list
logging.handlers.SocketHandler
Hi everyone, I wrote a logging server that receives messages from logging.handlers.SocketHandler objects in client Python programs. This works well so long as the client programs are start/stop affairs. However, if the client is also a long running daemon a problem shows up. If the logging server is restarted log messages from the daemon client go no where, and no errors are thrown. To correct the problem the daemon client has to be restarted, then it will reconnect with the logging server and all is well. My reading of the logging.handlers.SocketHandler documentation makes me think this isn't how things are supposed to work. Failures of the emit() method (how log messages are sent) will close the connection and retry it again at the next log message. By the way, we're running this with Python 2.4 on a CentOS Linux server. Does anyone have any ideas, pointers or suggestions about how to address this problem? Thanks in advance! Doug -- http://mail.python.org/mailman/listinfo/python-list
Re: logging.handlers.SocketHandler
Vinay, Thanks for the quick response, very much appreciated. I tried the two scripts you pointed to, modifying the client so it produces log messages in an endless loop, and it worked fine. If I left the client running and stopped and started the server, the client would reconnect and messages would start coming through. This is what you observed and what is stated in the documentation. Looking at the SocketHandler Python code shows this would be the behavior as well. To further complicate my original question, here is more information. The logging server I created uses Twisted (2.5.0) as it's network framework and gets network log messages from the clients using Twisted code. However, once a message is received it is parsed the same way as the example you pointed, in fact that's what I used as a model to build the code in my def dataReceived() method of my logging server Twisted protocol. I don't think there is anything unusual about the network handling in the Twisted framework. The client daemons are also Twisted processes, but in their cases they don't use any Twisted code to send log messages. The log system is entirely based on the Python logging module, and the logging.handlers.SocketHandler system specifically. So the clients should try to reconnect if the server goes down and comes back up. So I'm a little confused what to try next to resolve the problem I'm seeing. By the way, the Python logging system is great, really nice work! Again, thank you for your quick response and help. Doug On Mar 6, 4:53 am, Vinay Sajip wrote: > > It may be platform-related. I don't have access to your specific > platform, but I tried with ActivePython 2.5.2.2 on Windows and also > Python 2.5.2 on Ubuntu 8.04 (Hardy Heron). I used the scripts > described in > > http://docs.python.org/library/logging.html#sending-and-receiving-log... > > but modified the client script to place the logging statements in a > "while True:" loop. What I observed on both Windows and Ubuntu was > this: when I killed the socket receiver and restarted it, data from > the client was received by the new receiver process after a short > delay (of a few seconds). I had no need to restart the client to > achieve this. > > Can you try testing with these specific scripts in your environment? > > Regards, > > Vinay Sajip -- http://mail.python.org/mailman/listinfo/python-list
Re: logging.handlers.SocketHandler
Vinay, I did as you suggested and everything seemed to work; client programs were able to reconnect to the servers and log messages started showing up soon after the logging server was running again. I did this with my Twisted client/server setup and it showed the same behavior; clients reconnected to the server after a short delay. Messages were dropped while the server was down, but that was what I expected. This makes it look like I raised a "false alarm" and the logging.handlers.SocketHandler is behaving as expected, and it is more likely there is a problem in my application. My manager is suggesting that the underlying problem is using TCP rather than UDP (SocketHandler vs DatagramHandler) for logging from clients to the logging server. His assertion is that using TCP would guarantee the loss of 2 messages at the logging server from an attached daemon before a reconnect was established. I don't know enough about network protocols to determine if this is true or not, but the reading I've done about UDP talks about UDP being an unreliable protocol, so I'm not sure how using it would change the loss of 2 or more messages while a reconnect occurs. Perhaps because it is stateless and doesn't have to re-establish a connection? I'm not sure, what are your thoughts? Again, thanks for your help and support, Doug On Mar 6, 1:35 pm, Vinay Sajip wrote: > On Mar 6, 4:09 pm, writeson wrote: > > This would appear to indicate that the problem is to do with how > Twisted is being used. A couple of avenues to explore would be: > > 1. Try the simple test client I suggested with your Twisted server and > see what happens. If stuff comes through when the server is restarted, > that would appear to isolate the problem to somewhere in your client > daemons (at least as a working hypothesis). > > 2. Try the Twisted client daemons with the simple server from the > logging docs. If stuff comes through when the server is restarted, > then the problem might be with your Twisted server daemon. > > Of course, it may be necessary to modify the simple seem-to-work test > scripts to accommodate whatever specific wire format is being used in > your problem scenario. > > If either of these experiments yield some more information, it's > probably worth taking that to the Twisted community, perhaps the > twisted-python mailing list. > > Regards, > > Vinay Sajip -- http://mail.python.org/mailman/listinfo/python-list
IPython doesn't always comes up NoColor
Hi all, I've been looking at IPython for awhile, but I'm always disappointed that it comes up in NoColor mode no matter what I try. It is configured in the ipythonrc file to be 'color Linux'. I've run it from a Putty terminal window, Konsole and xterm, still no luck. The Putty window shows color for other types of things, like ls and emacs, etc., but no luck with IPython. I'm trying this on a CentOS 4 system and connecting via an Windows XP laptop. Thanks in advance for your help! Doug -- http://mail.python.org/mailman/listinfo/python-list
Determining when a file has finished copying
Hi all, I'm writing some code that monitors a directory for the appearance of files from a workflow. When those files appear I write a command file to a device that tells the device how to process the file. The appearance of the command file triggers the device to grab the original file. My problem is I don't want to write the command file to the device until the original file from the workflow has been copied completely. Since these files are large, my program has a good chance of scanning the directory while they are mid-copy, so I need to determine which files are finished being copied and which are still mid-copy. I haven't seen anything on Google talking about this, and I don't see an obvious way of doing this using the os.stat() method on the filepath. Anyone have any ideas about how I might accomplish this? Thanks in advance! Doug -- http://mail.python.org/mailman/listinfo/python-list
Re: Anyone happen to have optimization hints for this loop?
On Jul 9, 12:04 pm, dp_pearce <[EMAIL PROTECTED]> wrote: > I have some code that takes data from an Access database and processes > it into text files for another application. At the moment, I am using > a number of loops that are pretty slow. I am not a hugely experienced > python user so I would like to know if I am doing anything > particularly wrong or that can be hugely improved through the use of > another method. > > Currently, all of the values that are to be written to file are pulled > from the database and into a list called "domainVa". These values > represent 3D data and need to be written to text files using line > breaks to seperate 'layers'. I am currently looping through the list > and appending a string, which I then write to file. This list can > regularly contain upwards of half a million values... > > count = 0 > dmntString = "" > for z in range(0, Z): > for y in range(0, Y): > for x in range(0, X): > fraction = domainVa[count] > dmntString += " " > dmntString += fraction > count = count + 1 > dmntString += "\n" > dmntString += "\n" > dmntString += "\n***\n > > dmntFile = open(dmntFilename, 'wt') > dmntFile.write(dmntString) > dmntFile.close() > > I have found that it is currently taking ~3 seconds to build the > string but ~1 second to write the string to file, which seems wrong (I > would normally guess the CPU/Memory would out perform disc writing > speeds). > > Can anyone see a way of speeding this loop up? Perhaps by changing the > data format? Is it wrong to append a string and write once, or should > hold a file open and write at each instance? > > Thank you in advance for your time, > > Dan Hi Dan, Looking at the code sample you sent, you could do some clever stuff making dmntString a list rather than a string and appending everywhere you're doing a +=. Then at the end you build the string your write to the file one time with a dmntFile.write(''.join(dmntList). But I think the more straightforward thing would be to replace all the dmntString += ... lines in the loops with a dmntFile.write(whatever), you're just constantly adding onto the file in various ways. I think the slowdown you're seeing your code as written comes from Python string being immutable. Every time you perform a dmntString += ... in the loops you're creating a new dmntString, copying in the contents of the old, plus the appended content. And if your list can reach a half a million items, well that's a TON of string create, string copy operations. Hope you find this helpful, Doug -- http://mail.python.org/mailman/listinfo/python-list
Re: Determining when a file has finished copying
Guys, Thanks for your replies, they are helpful. I should have included in my initial question that I don't have as much control over the program that writes (pgm-W) as I'd like. Otherwise, the write to a different filename and then rename solution would work great. There's no way to tell from the os.stat() methods to tell when the file is finished being copied? I ran some test programs, one of which continously copies big files from one directory to another, and another that continously does a glob.glob("*.pdf") on those files and looks at the st_atime and st_mtime parts of the return value of os.stat(filename). >From that experiment it looks like st_atime and st_mtime equal each other until the file has finished being copied. Nothing in the documentation about st_atime or st_mtime leads me to think this is true, it's just my observations about the two test programs I've described. Any thoughts? Thanks! Doug -- http://mail.python.org/mailman/listinfo/python-list
Python2.5 and MySQLdb
Hi all, I'm running a CentOS 4 server and have installed Python2.5 on there (it's our development machine) in preparation of moving to Python2.5 everywhere. All looks good with our code and 2.5, except where it comes to MySQLdb, I can't get that to install on the machine. It generates a huge lists of errors and warnings from gcc when I run the python2.5 setup.py build script that comes with the tar file. Anyone have any suggestions? Thanks, Doug -- http://mail.python.org/mailman/listinfo/python-list
Twisted and txJSON-RPC
Hi all, I'm modifying a Twisted project and I'd like to implement the txJSON- RPC code show here: https://launchpad.net/txjsonrpc However, when I try to install this with this command line: sudo easy_install txJSON-RPC I get an error message: error: docs/PRELUDE.txt: No such file or directory I'm not sure what this is about, or how to fix it. Does anyone have any suggestions or help they can offer me? Thanks! Doug -- http://mail.python.org/mailman/listinfo/python-list
Extract images from PDF files
Hi all, I've looked around with Google quite a bit, but haven't found anything like what I'm looking for. Is there a Python library that will extract images from PDF files? My ultimate goal is to pull the images out, use the PIL library to reduce the size of the images and rebuild another PDF file that's an essentially "thumbnail" version of the original PDF file, smaller in size. We've been using imagick to extract the images, but it's difficult to script and slow to process the input PDF. Can someone suggest something better? Thanks in advance, Doug -- http://mail.python.org/mailman/listinfo/python-list
Re: Extract images from PDF files
David, Thanks for your reply, I'll take a look at pdftohtml and see if it suits my needs. Thanks! Doug -- http://mail.python.org/mailman/listinfo/python-list