Short:
  I'm looking for a front-end tool to help me process
(qualify and classify / catalog) about 10,000 scanned
images.

 Long:

 Hi,

  I've promissed to someone to process a bunch (10,000) of
images, and am realizing it might not be as simple as I
hoped.

  I'm soliciting, in this forum of image processing experts,
experiences and suggestions for possible solutions. I am a
complete newbie into image processing, with zero experience
(I know Gimp exists), so please don't think that I know what
I want. I only have an idea of what should the ideal target
roughly look like.

  Before I start, first is a meta-question - is this a good
forum to ask this at ? In which forum(s) would it be good to
ask these types of questions ? (I've noticed that gimp-perl 
has on average only 1.5 posts/mth)

  I'm looking for some tool(s) that would help me qualify
and classify / catalog a bunch of images. I can easily build
myself the database structures I need in MySQL or
PostgreSQL. I'm having trouble finding the frontend tool
which would allow me to view (and manipulate a little) the
image and would be an effective data entry tool for these
qualifications / classifications. I'm not so concerned at
this point about the viewing of the images once they are all
processed, althouh I imagine that the same tool might be
also used for viewing these images at the end. I would
prefer to run it all on linux, but would settle for Windows
front-end, if necessary. I know Javascript, PHP & Perl if
some integrating is needed. (I also know C++ and Java, but
would prefer not to use them for this, if possible.)

  I have about 4,000 photos w/EXIF info, but I'll write
about those at the bottom of this post.
  More importantly, I have about 7,000 B&W (dithered) scans
of docs of various contrasts (sometimes light gray), all of
them are text (no photos). I currently have them all (99%)
in pixel format (png and pbm), about 1% are jpegs. None are
multipage scans, all are single-image scans.

  I need to classify them in several "dimensions", but
elements / attributes of those dimensions may vary based on
the type of content the document carries;

  I need to build a searchable database, so I can find them
by specifying a criteria in one or more dimensions. E.g.
"all expense docs from `Botanical Gardens' involving period
June 23, 2003 to July 23, 2003", and a set of 140 image
files would fall out for display / browsing.

  I would really hope to have a frontend which would be
fully controllable via kbd, just because kbd is so much
faster to use than mouse (for most things (*1))

Key  Meaning
 a   "This is another page from the same doc. Write it into the DB and
 b   "This page is blank - doesn't contain any information" 6.1
display next scan".
 n   "This is scan is a page of another doc. Close the previous logical doc."
 d   "Add this page to a doc that has been created before"
 s   "Start a new doc."
 f   "This page pertains to finances." 6.2
 c   "This page pertains to finances / income." 6.2.1
 e   "This page pertains to finances / expense." 6.2.2
 l   "This page pertains to legal." 6.3
 i   "This page pertains to info." 6.4
 ...

(*1) - mouse comes in handy for only two actions: see G4 and
G7 below

 So I guess I would be looking for a "graphical engine", or
"display engine" capable of (hopefully fast) display and
manipulation of images. Separate zoomed window for fine
navigation would be a nice extra. It would be nice if it
would have combo boxes for choosing / adding items (see
dimension 4 below), where the selection of items narrows
down as you type lookup codes / starting letters of the
entities. ( see point 4 below )

 If worse comes to worst, I would settle for this whole
thing being done in javascript ( I found that it is possible
to draw lines / rectangles in javascript - see maptuit.com :
http://tremblant.www.maptuit.com/corporate/testdrive/getamap.html)
 But using browser and javascript for this image 
manipulation would be terribly slow, probably ugly, and I 
would hate if I had to use MS's Explorer's exentions :-(
 Not to mention that I have no idea how could I do 8x-zoom 
popup with mouse-fine-control in Javascript. Plus browsers 
don't really allow for easy image panning.

 Below is what I think my wishlist should be. But then 
again, I'm new to image processing ...

    Thanks,

         John


  This is what I imagine the graphical engine should be able to do:

 G1 fit-to-widow

 G2 fit-width-to-window

 G3 1-to-1 pixel zoom

 G4 8-to-1 pixel zoom (in a smaller window - see G7)

 G5 mouse movement in the above three items moves the image,
so whole page could be quickly visually scanned for defects

 G6 ability to specify areas (mostly rectangular, possibly
occasionally rotated) of an image [ this would tango with
the system feature 2.5 below - ability to treat these areas
as separate scans (as pieces of different documents) ]

 G7 fine-navigation: nice extra: when Conrol key or
something is pressed, a fine-navigation (8x zoomed) window
pops up on the side, and mouse movement is 8x finer - allows
for spefifying fine rotation angle (1.7.2) by means of
clicking on two points which *should* be in a straight
horizontal or vertical line on the original

 G8 another nice extra: "increase contrast" algorithm - in a
B/W or dithered picture: draw a 2 or 3-pixel wide line
between pixels that are less then distance X apart (this
will enhance). This is just my formulation of what a
"contrast enhancing" algorithm should do. Or another
algorithm with similar effect: if a pixel has another pixel
less than distance X away, turn other pixels black in its 2
or 3-pixel diameter.


  The dimensions would be:

 1 picture quality dimension:
 1.2 resolution : 300 ? 600 ? other ?
 1.3 lineart or dithered ?
 1.4 legible scan ?
 1.5 the whole page is scanned ? or are parts / edges missing?
 1.6 needs re-scan ?
 1.7 needs post-processing ?
 1.7.1 rotation by X*90 degrees
 1.7.2 rotation by Y*0.1 degrees
 1.7.3 increasing "contrast" (difficult with B&W/dithered pics)

 2 document structure dimension: (2.1 to 2.3 erased)
 2.4 which scan is the chapter title page, if any ?
 2.5 if one scan contains more than one logical document,
      how does the scan divide into areas containing them ?
 2.6 which library does it belong to ?
 2.7 which shelf within library does it belong to ?
 2.8 which volume of books on that shelf does it belong to ?
 2.9 which book in that volume does it belong to ?
 2.10 which chapter in that book does it belong to ?
 2.11 which page of the chapter is it ?
 2.12 which side of that page is it ?

 3 time dimension:
 3.1 date & time
 3.2 period (from date to date)
 3.3 expiry date
 3.4 other date

 4 entities dimension:
 4.1 from which entity ?  [ choose from / add to list of entities ]
 4.2 to which entity ?    [ choose from / add to list of entities ]
 4.3 publishing entity ?  [ choose from / add to list of entities ]
 4.4 from which address ? [ choose from / add to list of addresses ]
 4.5 to which address ?   [ choose from / add to list of addresses ]

 5 values:
 5.1 ID1
 5.2 ID2
 5.3 title
 5.4 subject
 5.5 value1
 5.6 value2
 5.7 value3

 6 flag:
 6.1 blank page ?
 6.2 financial ?
 6.2.1 expense ?
 6.2.2 income ?
 6.3 legal ?
 6.4 infomational ?
 6.5 expired ?

 7 ownership / responsibility for this doc:
 7.1 Jack's group
 7.1.1 Jack
 7.1.2 Peter
 7.2 Mary's group
 7.2.1 Mary
 7.2.1 Dennis


  Then I have about 4,000 JPEG color pics, most of them
w/EXIF data.

  With these, there may be additional qualification, plus 
some from above may not qualify
 1.8 rating of quality of composition (capturing the intended subject)
 1.9 rating of technical quality
 1.8.1 focused
 1.8.2 not shaken (when tripod not used)
 1.8.3 proper lighting / timing / contrast

  and then sorting them into categories :
 8. category
 8.1 trees
 8.1.1 indoor
 8.1.2 outdoor
 8.2 bushes
 8.3 tools
_______________________________________________
Gimp-user mailing list
[EMAIL PROTECTED]
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-user

Reply via email to