Re: A need for IRBL?

John Rudd Fri, 21 Apr 2006 12:22:06 -0700

Someone over on the mimedefang list is working on an OCR mechanism forscanning the image to text.

Another person also brought up the idea of hashing the images and doingsomething like an IRBL or razor approach, but everyone came to the sameconclusion you're coming to now.


But there was one suggestion that seemed interesting:

Averaging out regions of the image to their base colors, and matchingbased on that. They suggested 4 regions, but I think that's too broad.I think may be 16 or 64 regions might be better (4x4 or 8x8). Withineach grid section, you average out the color values of the pixels, andyou're left with big pixelated blotch of 16 or 64 squares. This willwash out the minor variations of pixels that defeats hashingmechanisms.

From there, I wouldn't hash the big blotched image, I would record the16 or 64 values, and directly use those, plus the rough image size, asyour matching data (rough image size meaning: it's ok to be + or - 32or 64 pixels on each size metric, to again account for imagevariations, but you don't want to compare a 8x8 pixel image to a640x640 pixel image). If the image in the email matches an image inthe database, then I wouldn't automatically reject it or mark it asspam -- this is a VERY rough comparison of the images. Instead, Iwould just give the message +3 or +4 to its score.

May not be perfect, but it may be interesting. I also wonder if it'dbe useful to do more regions (16x16 or 64x64?), and base the result on"how many regions matched".



On Apr 21, 2006, at 10:16 AM, Dirk Bonengel wrote:

Hi,
as Rob McEwen already pointed out Bill Stearns offered image hash datafor such a project. I did write such a plugin (Bill did publish hisdata via DNS, thanks again!) but am somewhat disappointed by theresults (so I didn't bother publishing the plugin).The point is that the most annoying image spams (i.e. those you wantto catch) are deliberatly defective or altered so that simple hashingof the image MIME parts doesn't really work. Seems to me that spammersalready practise hash busting methods on images, presumably cos somebig ISP(s) do check image hashes already,
Still, if disired I can post that plugin somewhere (with appropriatewords of caution)...
Dirk

John D. Hardin schrieb:
All:

A few posts back was a suggestion for checking the MD5 checksum of
attached images against a blacklist to catch the current wave of
attached-image-only stock pump-and-dump scam spams.

Taking that to its logical conclusion suggests the creation of a
public Image Realtime Block List along the lines of what SURBL
provides for URLs, and extending SA to MD5-sum attached images and
check them against the block list.

Is this a good idea? Is this a bad idea? Is it pointless, as spammers
would just generate per-message images the way they are probably
generating per-message random Bayes poison now? Is it already covered
by Razor et. al.?

Comments are solicited!

--
 John Hardin KA7OHZ    ICQ#15735746    http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]    FALaholic #11174    pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
 Senator, when you took your oath of office, you placed your hand on
 the Bible and swore to uphold the Constitution. You didn't place your
 hand on the Constitution and swear to uphold the Bible.
                    -- Jamie Raskin, Professor of Law at American
                    University, testifying before the Maryland Senate
-----------------------------------------------------------------------

Re: A need for IRBL?

Reply via email to