On Mon, 21 Aug 2006, John Rudd wrote:
On Aug 21, 2006, at 10:13 PM, Chip M. wrote:
While skimming thru my daily rejected spam pile, did a double take when a GIF spam seemed to "blink" at me. Thought it was a sw glitch at first... then realized the sneaky Borg had adapted again. Took a look at the frames in PaintShopPro's AnimationShop, and the first three are all but blank (wee bit of noise), followed by the payload.
Given the way the GIF format works, that is actually a reasonable way to inject "salt" into a given image to throw off checksumming. (If only the programmer who is doing the technical end of this would get a real job instead of working for a spammer...)
For animated, is there a clean break between "frames" of animation, something that netpbm or whatever can easily identify and break out into individual images?
Yes, briefly, the GIF format is a sequence of chunks. Before any image data comes along, a chunk defines the overall size of the GIF (sort of the size of the canvas), and then you can have a series of other chunks. One type of chunk says "draw this image on the virtual canvas at these coordinates using this palette" and another says "delay this long". Putting these two types of chunks together in the right sequence gives the ability to do animations. (It also, incidentally, gives you the ability to do full 24-bit color. Few people know GIF is actually capable of this. But even though it is capable, it is a hack, and very wasteful of space, so maybe that's for the better.)
It would be CPU intensive, but the right way to fight it might be to run the FuzzyOCR on each frame. And/or have a setting for "maximum frames to process", and if the GIF goes over that number of frames, give it a huge spam score.
Yeah, that is a bit tricky. I can think of a way to do a denial-of-service attack against the "run it on each frame" approach, but I won't share what that is. In theory, if that happens, one could write a plugin to examine the internal structure of the GIF and detect that. The one thing that would be important to guard against is suddenly flagging all animated GIFs as spam. Although I think they're really tacky and annoying, that doesn't mean that they are actually spam.
For interlaced ... I have no idea. Depends a lot on how the interlaced images are stored, I guess. And whether or not netpbm can generate the final image for processing, instead of having to work on the interlaced data.
I'm pretty sure it should be able to. If I recall correctly, interlaced GIFs just have the rows in a different order. It should be no problem to get the full image. - Logan