On May 31, 10:01 am, Larry Bates <[EMAIL PROTECTED]> wrote: > I have a project that I wanted to solicit some advice > on from this group. I have millions of pages of scanned > documents with each page in and individual .JPG file. > When the documents were scanned the people that did > the scanning put a colored (hot pink) separator page > between the individual documents. I was wondering if > there was any way to utilize PIL to scan through the > individual files, look at some small section on the > page, and determine if it is a separator page by > somehow comparing the color to the separator page > color? I realize that this would be some sort of > percentage match where 100% would be a perfect match > and any number lower would indicate that it was less > likely that it was a coverpage. > > Thanks in advance for any thoughts or advice. > > Regards, > Larry Bates
I used GraphicsMagick for a similar situation. Once installed you can run `gm identify' to return all sorts of usefull information about the images. In my case I had python call 'gm' to identify the number of colors in each image, then inspect the output and handle the image accordingly. I'll bet PIL could do a similar thing, but in my case I was examining DPX files which PIL can't handle. Either approach will most likely take a bit of time unless you spread the work over several machines. ~Sean -- http://mail.python.org/mailman/listinfo/python-list