Re: PDF: finding a blank image

2009-07-14 Thread DrLeif
On Jul 13, 6:22 pm, Scott David Daniels wrote: > DrLeif wrote: > > I have about 6000 PDF files which have been produced using a scanner > > with more being produced each day.  The PDF files contain old paper > > records which have been taking up space.   The scanner is set to > > detect when there

Re: PDF: finding a blank image

2009-07-14 Thread DrLeif
On Jul 13, 6:22 pm, Scott David Daniels wrote: > DrLeif wrote: > > I have about 6000 PDF files which have been produced using a scanner > > with more being produced each day.  The PDF files contain old paper > > records which have been taking up space.   The scanner is set to > > detect when there

Re: PDF: finding a blank image

2009-07-13 Thread Brian
Perhaps your blank pages have a characteristic size. Or perhaps if you trim them with `convert' (ImageMagick) there is nothing left. On Mon, Jul 13, 2009 at 3:44 PM, DrLeif wrote: > I have about 6000 PDF files which have been produced using a scanner > with more being produced each day. The PDF

Re: PDF: finding a blank image

2009-07-13 Thread Scott David Daniels
DrLeif wrote: I have about 6000 PDF files which have been produced using a scanner with more being produced each day. The PDF files contain old paper records which have been taking up space. The scanner is set to detect when there is information on the backside of the page (duplex scan). The

Re: PDF: finding a blank image

2009-07-13 Thread David Bolen
DrLeif writes: > What I would like to do is have python detect a "blank" pages in a PDF > file and remove it. Any suggestions? The odds are good that even a blank page is being "rendered" within the PDF as having some small bits of data due to scanner resolution, imperfections on the page, etc.

PDF: finding a blank image

2009-07-13 Thread DrLeif
I have about 6000 PDF files which have been produced using a scanner with more being produced each day. The PDF files contain old paper records which have been taking up space. The scanner is set to detect when there is information on the backside of the page (duplex scan). The problem of cours