image spam detection idea

Logan Shaw Fri, 04 Aug 2006 08:25:59 -0700

Looks like people have started to get a grip on the image
spams that are so popular lately, but here's an additional
idea I thought I'd toss out.  (I'm not familiar enough with
SA to easily figure out how to make a plugin.)


Basically, these spams all have a bunch of images which are
tiles of a larger image.  The tiling thing is, presumably, done
to avoid checksumming.  Now, here's the thing with tiling: the
left edge of one image will be extremely similar to the right
edge of the one next to it.  And same with top and bottom edges.

So it seems like a useful rule could decompress each of the
images, take the left and right columns and top and bottom rows
of each image, and compare those columns and rows to columns
and rows other images of similar dimensions.  If they correlate
closely (determined easily enough by subtracting one set of
pixels from the next), that's a strong indicator they were
expected to abut, which in turn is a strong indicator of spam.

Of course, this requires decoding the entire image, but the
analysis after that point should be fairly cheap (compared to
OCR, for example).

  - Logan

image spam detection idea

Reply via email to