Re: Suggest OCR plugin on Spamassassin 3.3.1 for image spam

2011-07-21 Thread Kris Deugau
dar...@chaosreigns.com wrote: On 07/20, Sharma, Ashish wrote: Can someone suggest some better OCR plugin for Spamassassin 3.3.1 for image spam? It still seems strange to me that anybody has ever bothered with using OCR to deal with image spam, when it's so easy, and for me not proble

Re: Suggest OCR plugin on Spamassassin 3.3.1 for image spam

2011-07-21 Thread David F. Skoll
On Thu, 21 Jul 2011 07:47:00 +0100 "Sharma, Ashish" wrote: > Can you please outline the other techniques that you use to catch > image spams? We find Bayes (we have our own implementation) and RBLs (again, we have our own) work pretty well. Regards, David.

Re: Suggest OCR plugin on Spamassassin 3.3.1 for image spam

2011-07-21 Thread Axb
http://wiki.apache.org/spamassassin/UnmaintainedCustomPlugins "OCR scanner and image validator SA-plugin" "OCR Plugin" may be worth a try.. no idea how well they work The Spamassassin wiki is so cool On 2011-07-21 8:53, Sharma, Ashish wrote: All, The current function

RE: Suggest OCR plugin on Spamassassin 3.3.1 for image spam

2011-07-20 Thread Sharma, Ashish
:03 AM To: users@spamassassin.apache.org Subject: Re: Suggest OCR plugin on Spamassassin 3.3.1 for image spam On 7/20/2011 9:18 PM, dar...@chaosreigns.com wrote: > On 07/20, Sharma, Ashish wrote: >> Can someone suggest some better OCR plugin for Spamassassin 3.3.1 for image >> spam? &

RE: Suggest OCR plugin on Spamassassin 3.3.1 for image spam

2011-07-20 Thread Sharma, Ashish
David, >[We don't use OCR, as it happens. We usually catch image spams anyway >using other techniques.] Can you please outline the other techniques that you use to catch image spams? Thanks Ashish Sharma -Original Message- From: David F. Skoll [mailto:d...@roaringpenguin

Re: Suggest OCR plugin on Spamassassin 3.3.1 for image spam

2011-07-20 Thread Jason Bertoch
On 7/20/2011 9:18 PM, dar...@chaosreigns.com wrote: On 07/20, Sharma, Ashish wrote: Can someone suggest some better OCR plugin for Spamassassin 3.3.1 for image spam? It still seems strange to me that anybody has ever bothered with using OCR to deal with image spam, when it's so easy, an

Re: Suggest OCR plugin on Spamassassin 3.3.1 for image spam

2011-07-20 Thread David F. Skoll
On Wed, 20 Jul 2011 21:18:48 -0400 dar...@chaosreigns.com wrote: > It still seems strange to me that anybody has ever bothered with > using OCR to deal with image spam, when it's so easy, and for me not > problematic, to just block all emails that might be image spam - > those

Re: Suggest OCR plugin on Spamassassin 3.3.1 for image spam

2011-07-20 Thread darxus
On 07/20, Sharma, Ashish wrote: > Can someone suggest some better OCR plugin for Spamassassin 3.3.1 for image > spam? It still seems strange to me that anybody has ever bothered with using OCR to deal with image spam, when it's so easy, and for me not problematic, to just block all

Suggest OCR plugin on Spamassassin 3.3.1 for image spam

2011-07-20 Thread Sharma, Ashish
FuzzyOCR for my Spamassassin stack. Lately I am not convinced with FuzzyOCR performance and the errors that I keep getting on it. Moreover the community support and active development on FuzzyOCR too seems to be missing. Can someone suggest some better OCR plugin for Spamassassin 3.3.1 for image

Re: New spamassassin OCR plugin

2009-05-27 Thread Benny Pedersen
On Wed, May 27, 2009 23:43, decoder wrote: > I am planning a new release, but my time schedule is though. super, i posted a new thread with subject "FuzzyOcr wordlist" new words to be added for latest spams -- http://localhost/ 100% uptime and 100% mirrored :)

Re: New spamassassin OCR plugin

2009-05-27 Thread decoder
LuKreme wrote: On 24-May-2009, at 18:40, Henrik K wrote: I don't know why users are so afraid of words like SVN. You have to look at the project, not version numbers. I don't have FuzzyOCR installed, and it's not because of the SVN. First, I don't think my server can take the processing hit

Re: New spamassassin OCR plugin

2009-05-27 Thread decoder
alex k wrote: If only FuzzyOCR's developer would read that ;) Unfortunately he doesn't seem to be interested in his project anymore. Maybe you could take care of this orphaned code. Dear Alex, I am reading exactly everything you write ;) The code is not orphaned, but also not being ext

Re: New spamassassin OCR plugin

2009-05-25 Thread LuKreme
On 24-May-2009, at 18:40, Henrik K wrote: I don't know why users are so afraid of words like SVN. You have to look at the project, not version numbers. I don't have FuzzyOCR installed, and it's not because of the SVN. First, I don't think my server can take the processing hit and second

Re: New spamassassin OCR plugin

2009-05-24 Thread Henrik K
On Sun, May 24, 2009 at 08:57:28AM +0200, alex k wrote: > > > Looks like nothing that fuzzyOCR couldn't do, being more flexible and > > proven > > by time. > > If only FuzzyOCR's developer would read that ;) > Unfortunately he doesn't seem to be interested in his project anymore. > Maybe you coul

Re: New spamassassin OCR plugin

2009-05-24 Thread Res
On Sun, 24 May 2009, LuKreme wrote: On 24-May-2009, at 03:10, alex k wrote: You forgot ocrad. Ocrad is needed by facileOCR (see "Dependencies") and as far as I know, there is no ready-to-use binary for Windows. You keep talking about Windows. The world is not bifurcated between windows and

Re: New spamassassin OCR plugin

2009-05-24 Thread LuKreme
On 24-May-2009, at 03:10, alex k wrote: You forgot ocrad. Ocrad is needed by facileOCR (see "Dependencies") and as far as I know, there is no ready-to-use binary for Windows. You keep talking about Windows. The world is not bifurcated between windows and Linux, there is Solaris, OS X, Free

Re: New spamassassin OCR plugin

2009-05-24 Thread mouss
(see "Dependencies") and as > far as I know, there is no ready-to-use binary for Windows. > # uname FreeBSD # cd /usr/ports/graphics/ocrad # make install clean ... $ pkg_info|grep ocrad ocrad-0.17_3OCR program implemented as filter As you see, it took one command to install o

Re: New spamassassin OCR plugin

2009-05-24 Thread wolfgang
Hi Xela, I think there has been some misunderstanding: In an older episode (Sunday, 24. May 2009), Henrik K wrote: > You should mention that it's pretty Linux centric, atleast code like > "ps -o pid,cmd --ppid $$ --no-header".. why don't you use perl > functions? In an older episode (Sunday, 24.

Re: New spamassassin OCR plugin

2009-05-24 Thread alex k
Hi, > On Sun, May 24, 2009 at 08:57:28AM +0200, alex k wrote: >> >> It is Linux centric and I do mention that on the project side. >> >> The code part you mention is the one that kills a leftover convert >> process >> after it reached its timeout, an exeption. >> You got the sources, go ahead and

Re: New spamassassin OCR plugin

2009-05-24 Thread Henrik K
On Sun, May 24, 2009 at 08:57:28AM +0200, alex k wrote: > > It is Linux centric and I do mention that on the project side. > > The code part you mention is the one that kills a leftover convert process > after it reached its timeout, an exeption. > You got the sources, go ahead and make a windows

Re: New spamassassin OCR plugin

2009-05-23 Thread alex k
Hi, > On Sat, May 23, 2009 at 12:43:15PM +0200, alex k wrote: >> Hi, >> It seems that image spam is back. So I wrote a new OCR plugin for >> spamassassin, which uses convert and ocrad to extract text. >> For details and download see: >> >> http://spielwies

Re: New spamassassin OCR plugin

2009-05-23 Thread Henrik K
On Sat, May 23, 2009 at 12:43:15PM +0200, alex k wrote: > Hi, > It seems that image spam is back. So I wrote a new OCR plugin for > spamassassin, which uses convert and ocrad to extract text. > For details and download see: > > http://spielwiese.la-evento.com/facileOCR/ > &g

Re: New spamassassin OCR plugin

2009-05-23 Thread alex k
Hi, > On 23.05.09 12:43, alex k wrote: >> It seems that image spam is back. So I wrote a new OCR plugin for >> spamassassin, which uses convert and ocrad to extract text. >> For details and download see: >> >> http://spielwiese.la-evento.com/facileOCR/ >> &

Re: New spamassassin OCR plugin

2009-05-23 Thread Matus UHLAR - fantomas
On 23.05.09 12:43, alex k wrote: > It seems that image spam is back. So I wrote a new OCR plugin for > spamassassin, which uses convert and ocrad to extract text. > For details and download see: > > http://spielwiese.la-evento.com/facileOCR/ > > We use this plugin on our

Re: New spamassassin OCR plugin

2009-05-23 Thread wolfgang
In an older episode (Saturday, 23. May 2009), alex k wrote: > Hi, > It seems that image spam is back. So I wrote a new OCR plugin for > spamassassin, which uses convert and ocrad to extract text. Thank you. It works out of the box (after installing ocrad) here on Ubuntu 8.04.2 linu

New spamassassin OCR plugin

2009-05-23 Thread alex k
Hi, It seems that image spam is back. So I wrote a new OCR plugin for spamassassin, which uses convert and ocrad to extract text. For details and download see: http://spielwiese.la-evento.com/facileOCR/ We use this plugin on our servers. It kicks out every image-spam, that made it through the

Re: ocr plugin

2008-05-02 Thread decoder
Theo Van Dinter wrote: On Fri, May 02, 2008 at 09:12:12PM +0200, decoder wrote: Also, the SA plugin architecture is not designed to modify the message in any way, so you cannot push back the text into the normal processing line. Really? Who says? I made very specific modifications i

Re: ocr plugin

2008-05-02 Thread Theo Van Dinter
On Fri, May 02, 2008 at 09:12:12PM +0200, decoder wrote: > Also, the SA plugin architecture is not designed to modify the message > in any way, so you cannot push back the text into the normal processing > line. Really? Who says? I made very specific modifications in 3.2 to allow for just that

Re: ocr plugin

2008-05-02 Thread decoder
Matus UHLAR - fantomas wrote: does it push the extracted text back to SA so it could be used by e.g. bayes? This is how it imho should be used. (and imho the same for .pdf and/or .doc - extract text _and_ images from it, call OCR for images...) That is a question that was very frequently

Re: ocr plugin

2008-05-02 Thread Matus UHLAR - fantomas
ed by e.g. bayes? This is how it imho should be used. (and imho the same for .pdf and/or .doc - extract text _and_ images from it, call OCR for images...) -- Matus UHLAR - fantomas, [EMAIL PROTECTED] ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address.

Re: ocr plugin

2008-05-02 Thread Joseph Brennan
> Am I right to say that picture spam has dropped dramatically since the > last months? Right. There's close to none now. Spam techniques come and go. Joseph Brennan Columbia University IT

Re: ocr plugin

2008-05-02 Thread William Taylor
On Fri, May 02, 2008 at 06:06:05PM +0300, Henrik K wrote: > On Fri, May 02, 2008 at 03:38:41PM +0200, polloxx wrote: > > Hi, > > > > Am I right to say that picture spam has dropped dramatically since the > > last months? > > Has there been any in a year? That's when I dropped using it. > It's p

Re: ocr plugin

2008-05-02 Thread Henrik K
On Fri, May 02, 2008 at 03:38:41PM +0200, polloxx wrote: > Hi, > > Am I right to say that picture spam has dropped dramatically since the > last months? Has there been any in a year? That's when I dropped using it.

Re: ocr plugin

2008-05-02 Thread William Taylor
ugin? I see the latest FuzzyORC > version is > not SA 3.2.x compatible. Are there more recent product compatible to 3.2.x? > Are you guys still running an ocr plugin on production servers? > > Thanks for your answers, > P. >

ocr plugin

2008-05-02 Thread polloxx
Hi, Am I right to say that picture spam has dropped dramatically since the last months? Is it still reasonable to run an orc plugin? I see the latest FuzzyORC version is not SA 3.2.x compatible. Are there more recent product compatible to 3.2.x? Are you guys still running an ocr plugin on

Re: Bayes combining and OCR (Was Re: SpamAssassin 3.2 compatiblity)

2007-06-01 Thread Matthias Keller
Justin Mason wrote: Matthias Keller writes: Nix wrote: On 31 May 2007, Graham Murray said: Nix <[EMAIL PROTECTED]> writes: (And, let's be blunt, the pure this-word-is-spammy recognition part of FuzzyOCR is much less smart than the Bayesian system already pres

Bayes combining and OCR (Was Re: SpamAssassin 3.2 compatiblity)

2007-06-01 Thread Justin Mason
probability scores between 0.2 and 0.8, since it uses words that also appear in ham (hence why it appears as poison). However, the OCR'd text would wind up with "strong" scores around 0.99 or greater. The chi-square probability combining algorithm we use takes care of this, by di

Re: Fuzzy OCR & annoying Outlook users

2007-05-11 Thread Kris Deugau
[EMAIL PROTECTED] wrote: > I'm using FuzzyOCR which works great. However, lately I've been seeing > annoying Outlook users using some kind of plugin which seem to add an > image, and it has the text "Free emoticons, download here" (or > something), mostly it's in my language and then it has the tex

AW: Fuzzy OCR & annoying Outlook users

2007-05-11 Thread Starckjohann, Ove
> [EMAIL PROTECTED] > Gesendet: Freitag, 11. Mai 2007 10:52 > An: users@spamassassin.apache.org > Betreff: Fuzzy OCR & annoying Outlook users > > > Hey, > > I'm using FuzzyOCR which works great. However, lately I've > been seeing > annoying Ou

Fuzzy OCR & annoying Outlook users

2007-05-11 Thread kshatriyak
Hey, I'm using FuzzyOCR which works great. However, lately I've been seeing annoying Outlook users using some kind of plugin which seem to add an image, and it has the text "Free emoticons, download here" (or something), mostly it's in my language and then it has the text "gratis". The word

bad OCR with some GIF images

2007-02-10 Thread Spamy.cz - Maxim Cerny
Hello, I'm using SA 3.1.7 with FuzzyOCR 3.5.1 . This month I started having troubles with some GIF spams. The OCR can't recognize it and prints out only some letters after doing the OCR. Have anybody seen it? Max [EMAIL PROTECTED] f]# spamassassin --debug FuzzyOcr < Přep\:\

Skipping OCR on Delivery Failures?

2007-01-31 Thread Josh Graham
I've set up Sendmail to send double bounces to /dev/null but I'm still getting a large about of "Delivery failures" to my spambox, and each one of them has been scanned by OCR. According to my logs in the last 48 hours I've scanned 1.3 million incoming messages and the s

Re: Despeckling images for OCR and anti-spam purposes

2006-12-23 Thread René Berber
Kelly Jones wrote: > Spammers are starting to put "speckles" in their images to defeat > OCR-scanning plugins such as FuzzyOCR. That's a very old technique. > I thought ImageMagick's -despeckle option would help, but it doesn't > seem to, not even whe

Re: Despeckling images for OCR and anti-spam purposes

2006-12-23 Thread René Berber
Kenneth Porter wrote: > --On Saturday, December 23, 2006 12:43 PM +0100 decoder > <[EMAIL PROTECTED]> wrote: > >> Which images are you refering to? If you can put up a sample, then I >> can tell you which scanner setting will catch it :) > > Does the SA wiki support uploading of images? Perhaps

Re: Despeckling images for OCR and anti-spam purposes

2006-12-23 Thread decoder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Kenneth Porter wrote: > --On Saturday, December 23, 2006 12:43 PM +0100 decoder > <[EMAIL PROTECTED]> wrote: > >> Which images are you refering to? If you can put up a sample, >> then I can tell you which scanner setting will catch it :) > > Does the

Re: Despeckling images for OCR and anti-spam purposes

2006-12-23 Thread Kenneth Porter
--On Saturday, December 23, 2006 12:43 PM +0100 decoder <[EMAIL PROTECTED]> wrote: Which images are you refering to? If you can put up a sample, then I can tell you which scanner setting will catch it :) Does the SA wiki support uploading of images? Perhaps we could have a page of just probl

Re: Despeckling images for OCR and anti-spam purposes

2006-12-23 Thread decoder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Kelly Jones wrote: > Spammers are starting to put "speckles" in their images to defeat > OCR-scanning plugins such as FuzzyOCR. Which images are you refering to? If you can put up a sample, then I can tell you which scanner setti

Despeckling images for OCR and anti-spam purposes

2006-12-22 Thread Kelly Jones
Spammers are starting to put "speckles" in their images to defeat OCR-scanning plugins such as FuzzyOCR. I thought ImageMagick's -despeckle option would help, but it doesn't seem to, not even when applied multiple times, not even in conjunction with -monochrome. I want a f

Re: Custom OCR ?

2006-12-12 Thread Theo Van Dinter
On Tue, Dec 12, 2006 at 06:06:02PM +0100, Janek Kozicki wrote: > Can you tell me how to painlessly tell spamassassin to call my script? Write a plugin. See something like FuzzyOcr. -- Randomly Selected Tagline: If Major BBS sucked, it would be good for something. pgpmGPNHMqFH0.pgp Description

Custom OCR ?

2006-12-12 Thread Janek Kozicki
r/run/spamd.pid" NICE="--nicelevel 15" I want to try several different OCR programs to filter spam. Let's say that I have written a script ~/bin/img2txt which takes as single argument the file containing image and prints to stdout OCRed text. Can you tell me how to painlessly

Re: spammers dodging OCR

2006-11-21 Thread alex
lol, just got a spam with the image obfuscated like captchas in a bbs, to avoid detection by ocr. On Mon, Nov 06, 2006 at 02:06:45PM -0600, Jorge Valdes wrote: > Gary V wrote: > >This morning I received my copy of networkworld. Here is an > >interesting artic

Re: Fuzzy OCR - first time user

2006-11-18 Thread decoder
Marc Perkel wrote: OK - trying out the FuzzyOCR plugin. So far it all the default stuff with minimal installation. I'm running Fedora Core 6. Used the gocr RPM and didn't patch the source. Everything is default and it doesn't seem to be complaining so . If I like this what do I need to ch

Fuzzy OCR - first time user

2006-11-17 Thread Marc Perkel
OK - trying out the FuzzyOCR plugin. So far it all the default stuff with minimal installation. I'm running Fedora Core 6. Used the gocr RPM and didn't patch the source. Everything is default and it doesn't seem to be complaining so . If I like this what do I need to change to really do it

Re: spammers dodging OCR

2006-11-06 Thread Jorge Valdes
Gary V wrote: This morning I received my copy of networkworld. Here is an interesting article: http://www.networkworld.com/columnists/2006/103006buzz-spammers-dodging-ocr.html Gary V _ Add a Yahoo! contact to Windows Live Mess

spammers dodging OCR

2006-11-06 Thread Gary V
This morning I received my copy of networkworld. Here is an interesting article: http://www.networkworld.com/columnists/2006/103006buzz-spammers-dodging-ocr.html Gary V _ Add a Yahoo! contact to Windows Live Messenger for a chance

Re: Fuzzy OCR false positives from Screenshots...

2006-09-08 Thread jdow
From: "David B Funk" <[EMAIL PROTECTED]> On Fri, 8 Sep 2006, Michael Grey wrote: In regards to the second, many large companies have outside companies do work for them in the areas of marketing and other aspects. So this also will happen regardless. Let me clarify; this is an OUTSIDE relay to

Re: Fuzzy OCR false positives from Screenshots...

2006-09-08 Thread jdow
mber 08, 2006 09:40 Subject: Fuzzy OCR false positives from Screenshots... We are testing a new configuration using FuzzyOCR, and found it to work very well overall... However, there have been two occasions in the last 24 hrs where screenshots embedded into the emails caused false positive

RE: Fuzzy OCR false positives from Screenshots...

2006-09-08 Thread David B Funk
On Fri, 8 Sep 2006, Michael Grey wrote: > In regards to the second, many large companies have outside companies do work > for them in the areas of marketing and other aspects. So this also will > happen regardless. > > Let me clarify; this is an OUTSIDE relay to INSIDE... > > A FuzzyOCR White List

Re: Fuzzy OCR false positives from Screenshots...

2006-09-08 Thread Logan Shaw
On Fri, 8 Sep 2006, Michael Grey wrote: We are testing a new configuration using FuzzyOCR, and found it to work very well overall... However, there have been two occasions in the last 24 hrs where screenshots embedded into the emails caused false positives. One was an 'account summary' from a c

RE: Fuzzy OCR false positives from Screenshots...

2006-09-08 Thread Logan Shaw
On Fri, 8 Sep 2006, Randal, Phil wrote: Score appropriately, train your Bayes well, and the false positives should diminish. FUZZY_OCR gives crazily high scores to certain things. One point per matched keyword, I believe. I've seen FUZZY_OCR, by itself, give scores as high as 24.00. Here's th

RE: Fuzzy OCR false positives from Screenshots...

2006-09-08 Thread Randal, Phil
t; To: users@spamassassin.apache.org > Subject: RE: Fuzzy OCR false positives from Screenshots... > > > You will have to ask the cell company about the first issue ... > > In regards to the second, many large companies have outside > companies do work > for them in the areas of

RE: Fuzzy OCR false positives from Screenshots...

2006-09-08 Thread Michael Grey
FuzzyOCR White List with (very privately held) keywords would help. Any other ideas ? Michael Grey -Original Message- From: John D. Hardin [mailto:[EMAIL PROTECTED] Sent: Friday, September 08, 2006 10:10 AM To: Michael Grey Cc: users@spamassassin.apache.org Subject: Re: Fuzzy OCR false

Re: Fuzzy OCR false positives from Screenshots...

2006-09-08 Thread John D. Hardin
On Fri, 8 Sep 2006, Michael Grey wrote: > However, there have been two occasions in the last 24 hrs where screenshots > embedded into the emails caused false positives. > > One was an 'account summary' from a cell company, the other was some internal > marketing info. > > Are there other approac

Fuzzy OCR false positives from Screenshots...

2006-09-08 Thread Michael Grey
We are testing a new configuration using FuzzyOCR, and found it to work very well overall…   However, there have been two occasions in the last 24 hrs where screenshots embedded into the emails caused false positives.   One was an ‘account summary’ from a cell company, the other was so

Re: Tesseract OCR open sourced

2006-09-05 Thread John D. Hardin
it's very easy to choose which OCR engine you wish it to use. -- John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 - 2D8C 34F4 6411

Re: Tesseract OCR open sourced

2006-09-05 Thread Robert LeBlanc
nd Maia-Users lists, along with the results of some preliminary tests I conducted with Tesseract OCR vs. GOCR, and it looks promising. Here's what I posted: === post begins === It's already "usable"; I've compiled it and done some basic tests with it, and it does se

Re: Tesseract OCR open sourced

2006-09-05 Thread Kenneth Porter
Theo just mentioned this on the -devel list:

Tesseract OCR open sourced

2006-09-04 Thread jdow
http://developers.slashdot.org/developers/06/09/04/2215210.shtml Tesseract, developed by HP labs, is touted as one of the most accurate OCR programs available. Google cleaned it up and has released it OS. {^_^}

Re: OCR plugin doesn't seem to work

2006-08-23 Thread decoder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Mike Pepe wrote: > decoder wrote: > >> Which OCR plugin are you using there? If it is the original >> OcrPlugin, then you might try FuzzyOcr instead. The original >> OcrPlugin was more proof-of-concept, and will cause you lots

Re: OCR plugin doesn't seem to work

2006-08-22 Thread Mike Pepe
decoder wrote: Which OCR plugin are you using there? If it is the original OcrPlugin, then you might try FuzzyOcr instead. The original OcrPlugin was more proof-of-concept, and will cause you lots of headaches with the current image spam... I did upgrade to FuzzyOCR after I read your message

Re: OCR plugin doesn't seem to work

2006-08-21 Thread decoder
re_specific.cf local.cf > WebRedirect.cf 70_sare_spoof.cf Ocr.cf > WebRedirect.pm 70_sare_stocks.cfOcr.pm 70_sare_uri0.cf > RulesDuJour > Which OCR plugin are you using there? If it is the original OcrPlugin, then you might try FuzzyOcr instead. The original OcrPlugin was more proof-of-con

OCR plugin doesn't seem to work

2006-08-21 Thread Mike Pepe
Hey guys, Running SA 3.1.1, on Fedora Core 3, with Perl 5.8.5 I installed gocr and imagemagick packages, copied the Ocr.pm and cf files into /etc/mail/spamassassin The tests don't seem to run, the pump 'n dump GIFs are still arriving and I don't see that the test is being run in the headers.

Re: Improved OCR Plugin with approximate matching

2006-08-18 Thread Matthias Keller
decoder wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 decoder wrote: Hello there, I have improved the original OcrPlugin (found at http://wiki.apache.org/spamassassin/OcrPlugin), so it contains fuzzy matching. Like that, mistakes made by the OCR recognition or intentional

Re: Improved OCR Plugin with approximate matching

2006-08-17 Thread decoder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 decoder wrote: > Hello there, > > I have improved the original OcrPlugin (found at > http://wiki.apache.org/spamassassin/OcrPlugin), so it contains > fuzzy matching. Like that, mistakes made by the OCR recognition or > intentional o

Re: Improved OCR Plugin with approximate matching

2006-08-13 Thread decoder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 decoder wrote: > Hello there, > > I have improved the original OcrPlugin (found at > http://wiki.apache.org/spamassassin/OcrPlugin), so it contains > fuzzy matching. Like that, mistakes made by the OCR recognition or > intentional o

Re: Improved OCR Plugin with approximate matching

2006-08-10 Thread Theo Van Dinter
On Thu, Aug 10, 2006 at 10:55:30AM -0700, Dave . wrote: > foreach my $p ( $pms->{msg}->find_parts("image") ) { >Does this mean the message must have the text "image" and/or "image/gif" >within the body? Many of the "penny stock" spam gifs I get appear as follows: Generally speaking, RTM (Mail::S

Re: Improved OCR Plugin with approximate matching

2006-08-10 Thread decoder
> $p->get_header('content-type') ); if ( $ctype eq "image/gif" ) { > open OCR, "|/usr/bin/convert - pnm:-|/usr/bin/gocr -i - > > /tmp/spamassassin.ocr.$$"; foreach $p ( $p->decode() ) { print OCR > $p; --- Does this mean the message m

RE: Improved OCR Plugin with approximate matching

2006-08-10 Thread Dave .
Give them code from Ocr.pm:--- foreach my $p ( $pms->{msg}->find_parts("image") ) { my ( $ctype, $boundary, $charset, $name ) =Mail::SpamAssassin::Util::parse_content_type( $p->get_header('content-type') ); if ( $ctype eq &qu

Re: Improved OCR Plugin with approximate matching

2006-08-10 Thread amosch . security
On Tue, Aug 08, 2006 at 12:43:24AM +0200, decoder wrote: > > You can find a full description and an example in the wiki under: > > http://wiki.apache.org/spamassassin/FuzzyOcrPlugin > > > Ideas for improvements or critics are always welcome :) > > Hi, First, thanks for working on such a gr

Re: Improved OCR Plugin with approximate matching

2006-08-10 Thread decoder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Bill Landry wrote: > - Original Message - From: "Spamassassin List" > <[EMAIL PROTECTED]> > To: > Sent: Wednesday, August 09, 2006 2:26 PM > Subject: Re: Improved OCR Plugin with approximate matchin

Re: Improved OCR Plugin with approximate matching

2006-08-10 Thread Mathias Tauber
> > yum install libungif* will get both libungif and libungif-progs (which > > contains giffix) I'm using Debian (Sarge) and I think libungif-bin is here the better package. giflib-bin wants to install the packages libx11-6, xfree86-common, xlibs-data additionaly. Which means 10MB more than inst

RE: Improved OCR Plugin with approximate matching

2006-08-09 Thread Rick Cooper
> -Original Message- > From: decoder [mailto:[EMAIL PROTECTED] > Sent: Wednesday, August 09, 2006 5:31 PM > To: Spamassassin List; users@spamassassin.apache.org > Subject: Re: Improved OCR Plugin with approximate matching > > [snip] > > According to google, lib

Re: Improved OCR Plugin with approximate matching

2006-08-09 Thread Bill Landry
- Original Message - From: "Spamassassin List" <[EMAIL PROTECTED]> To: Sent: Wednesday, August 09, 2006 2:26 PM Subject: Re: Improved OCR Plugin with approximate matching Spamassassin List wrote: decoder wrote: See http://wiki.apache.org/spamassassin/FuzzyOcrPlugi

Re: Improved OCR Plugin with approximate matching

2006-08-09 Thread decoder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Spamassassin List wrote: >> Spamassassin List wrote: > decoder wrote: > > See http://wiki.apache.org/spamassassin/FuzzyOcrPlugin > > Major changes: Replaced imagemagick with netpbm, support > png, invoked giffix for broken gifs,

Re: Improved OCR Plugin with approximate matching

2006-08-09 Thread Spamassassin List
Spamassassin List wrote: decoder wrote: See http://wiki.apache.org/spamassassin/FuzzyOcrPlugin Major changes: Replaced imagemagick with netpbm, support png, invoked giffix for broken gifs, detect image format with magic bytes and not by content-type, added various configuration options. I ins

Re: Improved OCR Plugin with approximate matching

2006-08-09 Thread decoder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Spamassassin List wrote: >>> decoder wrote: >>> >>> See http://wiki.apache.org/spamassassin/FuzzyOcrPlugin >>> >>> Major changes: Replaced imagemagick with netpbm, support png, >>> invoked giffix for broken gifs, detect image format with magic >>> byte

Re: Improved OCR Plugin with approximate matching

2006-08-09 Thread Spamassassin List
decoder wrote: See http://wiki.apache.org/spamassassin/FuzzyOcrPlugin Major changes: Replaced imagemagick with netpbm, support png, invoked giffix for broken gifs, detect image format with magic bytes and not by content-type, added various configuration options. I install the above plugin, and

Re: Improved OCR Plugin with approximate matching

2006-08-09 Thread decoder
and the rest > between 30 to 83. Scan time ran between 6.4 and 16.6 seconds per > message. I'm using a ton of SARE rules on a RHE server, dual xeon > 2.4 ghz with 2 gig ram. > > If OCR is processor/memory intensive, could it be configured to kick > in for lower scoring messages

Re: Improved OCR Plugin with approximate matching

2006-08-09 Thread Expertsites, Inc.
Since installation yesterday, my system hit FUZZY_OCR in 204 messages. One scored 18, ten scored in the 20's and the rest between 30 to 83. Scan time ran between 6.4 and 16.6 seconds per message. I'm using a ton of SARE rules on a RHE server, dual xeon 2.4 ghz with 2 gig ram.

Re: Improved OCR Plugin with approximate matching

2006-08-08 Thread decoder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 decoder wrote: > Hello there, > > I have improved the original OcrPlugin (found at > http://wiki.apache.org/spamassassin/OcrPlugin), so it contains > fuzzy matching. Like that, mistakes made by the OCR recognition or > intentional o

Re: Improved OCR Plugin with approximate matching

2006-08-08 Thread John D. Hardin
On Tue, 8 Aug 2006, decoder wrote: > I only wanted to add a small note: I recently saw gifs that cannot be > converted using imagemagick because they are either sloppy generated > or with intention partly corrupted. Please think about using giftopnm > and jpegtopnm instead. If you have a better id

Re: Improved OCR Plugin with approximate matching

2006-08-08 Thread Marc Perkel
corrupted. Please think about using giftopnm and jpegtopnm instead. If you have a better idea, tell me. To use giftopnm and jpegtopnm, change the code from: if (($ctype eq "image/gif") || ($ctype eq "image/jpeg")) { open OCR, "|/usr/bin/convert - pnm:-|

Re: Improved OCR Plugin with approximate matching

2006-08-08 Thread decoder
ugin), so it contains >> fuzzy matching. Like that, mistakes made by the OCR recognition >> or intentional obfuscations in the text don't make the >> recognition impossible. This is being done with a relative >> distance calculation between the pattern (word from a given wor

Re: Improved OCR Plugin with approximate matching

2006-08-08 Thread Matthias Keller
decoder wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello there, I have improved the original OcrPlugin (found at http://wiki.apache.org/spamassassin/OcrPlugin), so it contains fuzzy matching. Like that, mistakes made by the OCR recognition or intentional obfuscations in the text

Re: Improved OCR Plugin with approximate matching

2006-08-08 Thread decoder
you have a better idea, tell me. To use giftopnm and jpegtopnm, change the code from: if (($ctype eq "image/gif") || ($ctype eq "image/jpeg")) { open OCR, "|/usr/bin/convert - pnm:-|/usr/bin/gocr -i - > /tmp/spamassassin.focr.$$"; to: if ((

Re: Improved OCR Plugin with approximate matching

2006-08-07 Thread jdow
From: "uNiXpSyChO" <[EMAIL PROTECTED]> decoder wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello there, I have improved the original OcrPlugin (found at http://wiki.apache.org/spamassassin/OcrPlugin), so it contains fuzzy matching. Like that, mistakes made by the OC

Re: Improved OCR Plugin with approximate matching

2006-08-07 Thread uNiXpSyChO
seems to work... but i never see a score about 1.00. the docs say the default score is 4. did i miss something? above 1.00 i meant.

Re: Improved OCR Plugin with approximate matching

2006-08-07 Thread uNiXpSyChO
decoder wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello there, I have improved the original OcrPlugin (found at http://wiki.apache.org/spamassassin/OcrPlugin), so it contains fuzzy matching. Like that, mistakes made by the OCR recognition or intentional obfuscations in the text

Improved OCR Plugin with approximate matching

2006-08-07 Thread decoder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello there, I have improved the original OcrPlugin (found at http://wiki.apache.org/spamassassin/OcrPlugin), so it contains fuzzy matching. Like that, mistakes made by the OCR recognition or intentional obfuscations in the text don't mak

OCR

2006-08-07 Thread Filbert
Hi, I'm planning to test the OCR module in SA very soon. I was wondering if other (commercial) anti-spam products already have a OCR module built-in? Thx F.

  1   2   >