Re: sa-learn from a cronjob?

Bob Proulx Wed, 30 Apr 2014 12:54:13 -0700

RW wrote:
> Bob Proulx wrote:
> > The script is looping through mail files in a maildir and processing
> > them remotely on the server through sa-learn.  After processing the
> > messages it is moving the messages to mark them as having been read.
> 
> No, the Maildir spec defines the "S" flag in the info field for marking
> mail as read (seen), the new/ to cur/ move  is done by an IMAP server
> (or a local Unix client) in the first session that sees the new mail. 
> 
> Copying an email into an IMAP folder via IMAP will not put it into the
> new/ sub-directory of the underlying maildir. Opening a folder in IMAP
> will empty the new/ sub-directory.
> 
> If you don't believe this, I suggest you actually try it on a real
> IMAP server.   I just tried it on Dovecot, and I found it behaves as I
> expected. Newly delivered mail is moved to cur/ when a client is first
> informed about it, copied mail goes to cur/ in the destination mailbox.


Hmm...  Works for me.  Apparently it works for Ian.  YMMV.

Personally my process removes mail from incoming spam-new folder and
then saves it into the processed spam folder.  That is the way I
prefer to run it.  I use two folders rather than one.  Again YMMV.
Works for me.  Sorry if it does not work for you.

> > > You might have mentioned that because it means it's not the
> > > solution you implied when you wrote "Here is my cronjob for that
> > > purpose". It's certainly not appropriate to users that don't like
> > > the command line.
> > 
> > Sorry but you are incorrect.  Users of Ian's system need not use the
> > command line.  His solution directly answered the Dan's question.
> 
> No, he said himself that my objections don't apply because it's an
> isolated mailbox that's not read by anything except the cron script. A
> macro in the client places the mail directly into the mailbox (bypassing
> the client's conventional mailbox handling) - this is really only even
> remotely sensible for a local instance of mutt, emacs etc.

I think you are completely misunderstanding how this type of process
works.  And I can't avoid saying that this seems intentional by the
tone.  Sorry.  But that is the way it reads to me.  Have tried to help
in good faith but if that good faith is not reciprocated then I am
going to lose interest very quickly.

But let me try again very briefly one last time anyway since I am an
incorrigible optimist.  Two things are very common.  IMAP servers.
Use of maildir.  One does not require the other.  But they very often
appear together.  It is not required to use mutt or emacs or other of
the traditional email clients for this even if that is a typical
desired developer environment.  All that is required for this type of
scripted method is that the backend use maildirs for mail storage.
That way the files can be scanned and processed offline.  I dare say
that most of the masses use web email clients these days.  Or if not
most then a very large number.  They will never see the maildir.

Since use of maildirs is typical for an IMAP server it means that any
of the plethora of imap clients, including web email interfaces to
imap, can be used to interact with the imap server and through that
the maildir folders on the backend.  A user running an imap client
might never see the maildir.  A user running a web mail client would
certainly never see a maildir.  That doess not mean that the maildir
does not exist.  That does not mean that the maildir cannot be scanned
and processed offline for background training of the Bayes database.

The maildir exists and a cron script can be used to scan and process
mail incoming there.  People do it.  It works.  Saying it does not
work or is not sensible is just wrong mean talk.  People do this all
of the time.  Ian does it.  I do it.  Meanwhile no one is disputing
that there are better ways to do things.  There are always better
ways.  Which is why it is so much appreciated when people share.  Then
we can all learn and move forward.  But what can be said when someone
says that something people are doing and making good use of is not
sensible?  I think I will choose to say nothing more.

> Mostly, it's pretty trivial to train Bayes from Maildir, but there
> is one significant complication, and that's that moving mail between
> Maildirs after training may break IMAP keywords, which some clients
> use for custom flags or for sharing proprietary metadata between
> separate client instances. 

Yes it is pretty trivial.  Which has been the topic of this thread.
Simple scripts to scan and process maildirs.  Here you point out some
likely valid issues of breaking tags.  However maintaining tags for
spam messages moved into the training folder isn't a problem that I
find compelling.  Certainly not compelling enough to not do it.

I look forward to reading your positive contribution to the anti-spam
effort.

Bob

-- 
  http://xkcd.com/386/

Re: sa-learn from a cronjob?

Reply via email to