[Dovecot] maildir maintenance?

2010-11-23 Thread Brian Kroth
Hi, I'm running version 1.2.15 (so no doveadm) with around 6000 maildir
users, some of which are very large.  For completeness, the details of
the setup are as follows: 
- The maildirs are stored via NFS.
- The indexes are on a volume local to the dovecot server.
- Only one IMAP server currently.
- A separate sendmail/procmail server delivers via NFS.

I recently wrote the attached script (could probably be improved) meant
in theory to be run as a nightly cron job to do the follow:
1) Remove old messages marked as deleted.
(I didn't really like the expunge/trash plugins the last time I looked
at them.)
2) Rebuild the indexes to account of those changes.
(So that the users aren't delayed by dovecot rebuilding them.)
3) Rebuild FTS indexes.
(Again to avoid delay.)
4) Rebuild the maildirsize file.
(It seems to become slightly inaccurate over time.)

To the devs, I'm wondering if this seems sane?

To other users of dovecot, I'm wondering what, if any sort of
maintenance operations you tend to do in your setups?

Thanks,
Brian


dovecot-maintenance.sh
Description: Bourne shell script


signature.asc
Description: Digital signature


Re: [Dovecot] maildir maintenance?

2010-11-24 Thread Brian Kroth
Timo Sirainen  2010-11-24 18:01:
> On 24.11.2010, at 2.55, Brian Kroth wrote:
> 
> > Hi, I'm running version 1.2.15 (so no doveadm)
> 
> You could build Dovecot v2.0 and only use doveadm binary from it.

Does it just issue the command via IMAP?  No direct filesystem
operations?

> > I recently wrote the attached script (could probably be improved) meant
> > in theory to be run as a nightly cron job to do the follow:
> ..
> > To the devs, I'm wondering if this seems sane?
> 
> Looks about ok. The main thing I'm worried is what happens if user creates 
> mailboxes containing " or ' or ` characters.

Yeah, that was mostly me being lazy in my wanting to deal with escaping,
so I just ignored them.

In what I originally wrote, I think it just won't touch them.

Or is the issue that the find command might remove them and then the
indexes don't get fixed up?  I suppose I could just make sure that the
find ignores those dirs, but I thought (from other maillist reading)
that the next time their client SELECTs the folder it'll fix it up
anyways.

I suppose another spin on this would be for me to script the preauth
imap client to figure out which mailboxes have messages marked for
deletion of such and such an age and then try to use EXPUNGE to wipe
just them out.  I'm not sure off hand if that's possible.

I suppose that's what doveadm already does.

Thanks,
Brian


signature.asc
Description: Digital signature


Re: [Dovecot] maildir maintenance?

2010-11-24 Thread Brian Kroth
Timo Sirainen  2010-11-24 19:04:
> On 24.11.2010, at 18.59, Brian Kroth wrote:
> 
> >>> Hi, I'm running version 1.2.15 (so no doveadm)
> >> You could build Dovecot v2.0 and only use doveadm binary from it.
> > 
> > Does it just issue the command via IMAP?  No direct filesystem
> > operations?
> 
> It's all direct filesystem operations, no IMAP. But v1.2.15 can read v2.0's 
> index files just fine.
> 
> >> Looks about ok. The main thing I'm worried is what happens if user creates 
> >> mailboxes containing " or ' or ` characters.
> > 
> > Yeah, that was mostly me being lazy in my wanting to deal with escaping,
> > so I just ignored them.
> > 
> > In what I originally wrote, I think it just won't touch them.
> > 
> > Or is the issue that the find command might remove them and then the
> > indexes don't get fixed up?  I suppose I could just make sure that the
> > find ignores those dirs, but I thought (from other maillist reading)
> > that the next time their client SELECTs the folder it'll fix it up
> > anyways.
> 
> I was more thinking what happens if the user creates a mailbox called `rm -rf 
> /` or something.. Also if there are " or \ characters I think the LIST output 
> will use literals and your parsing will break more or less badly.

That's certainly true.  I guess I was just hoping to skip over those
mailboxes with unpleasant characters for the moment :}

More likely I'll rewrite this more carefully in Perl.

> > I suppose another spin on this would be for me to script the preauth
> > imap client to figure out which mailboxes have messages marked for
> > deletion of such and such an age and then try to use EXPUNGE to wipe
> > just them out.  I'm not sure off hand if that's possible.
> 
> That would be a bit difficult at least to do via IMAP..

So I'm finding.  I guess I was thinking I could find the messages in a
SELECTed mailbox via some parsing of either 
- UID SEARCH DELETED X-SINCE $N_days_ago (where X-SINCE search X-SAVEDATE
  instead of INTERNALDATE), or
- UID FETCH 1:* (INTERNALDATE X-SAVEDATE FLAGS) as I've seen bantered about, or
- combine the two and SEARCH DELETED, then
  UID FETCH $initial_uid_list (X-SAVEDATE FLAGS) to refine the list.

Then use the (U?)IDs I get back from that to do
- UID EXPUNGE $uid_list

Of course I've only started researching that avenue, so maybe that's not
so reasonable.

I'm starting to see why so much effort has been expended on this front.

Thanks,
Brian


signature.asc
Description: Digital signature


Re: [Dovecot] maildir maintenance?

2010-11-26 Thread Brian Kroth
Brian Kroth  2010-11-24 13:28:
> Timo Sirainen  2010-11-24 19:04:
> > On 24.11.2010, at 18.59, Brian Kroth wrote:
> > 
> > >>> Hi, I'm running version 1.2.15 (so no doveadm)
> > >> You could build Dovecot v2.0 and only use doveadm binary from it.
> > > 
> > > Does it just issue the command via IMAP?  No direct filesystem
> > > operations?
> > 
> > It's all direct filesystem operations, no IMAP. But v1.2.15 can read v2.0's 
> > index files just fine.
> > 
> > >> Looks about ok. The main thing I'm worried is what happens if user 
> > >> creates mailboxes containing " or ' or ` characters.
> > > 
> > > Yeah, that was mostly me being lazy in my wanting to deal with escaping,
> > > so I just ignored them.
> > > 
> > > In what I originally wrote, I think it just won't touch them.
> > > 
> > > Or is the issue that the find command might remove them and then the
> > > indexes don't get fixed up?  I suppose I could just make sure that the
> > > find ignores those dirs, but I thought (from other maillist reading)
> > > that the next time their client SELECTs the folder it'll fix it up
> > > anyways.
> > 
> > I was more thinking what happens if the user creates a mailbox called `rm 
> > -rf /` or something.. Also if there are " or \ characters I think the LIST 
> > output will use literals and your parsing will break more or less badly.
> 
> That's certainly true.  I guess I was just hoping to skip over those
> mailboxes with unpleasant characters for the moment :}
> 
> More likely I'll rewrite this more carefully in Perl.
> 
> > > I suppose another spin on this would be for me to script the preauth
> > > imap client to figure out which mailboxes have messages marked for
> > > deletion of such and such an age and then try to use EXPUNGE to wipe
> > > just them out.  I'm not sure off hand if that's possible.
> > 
> > That would be a bit difficult at least to do via IMAP..
> 
> So I'm finding.  I guess I was thinking I could find the messages in a
> SELECTed mailbox via some parsing of either 
> - UID SEARCH DELETED X-SINCE $N_days_ago (where X-SINCE search X-SAVEDATE
>   instead of INTERNALDATE), or
> - UID FETCH 1:* (INTERNALDATE X-SAVEDATE FLAGS) as I've seen bantered about, 
> or
> - combine the two and SEARCH DELETED, then
>   UID FETCH $initial_uid_list (X-SAVEDATE FLAGS) to refine the list.
> 
> Then use the (U?)IDs I get back from that to do
> - UID EXPUNGE $uid_list
> 
> Of course I've only started researching that avenue, so maybe that's not
> so reasonable.
> 
> I'm starting to see why so much effort has been expended on this front.
> 
> Thanks,
> Brian

So, I redid this in Perl to only use IMAP rather than any sudo or find
calls.  In theory then one doesn't need to worry about the indexes being
out of sync.  I still skipped over the "strange characters" mailboxes
for the moment.  I'm wondering what you think of this second rendition?

The only thing I'm not quite sure about is if there's some sort of race
between clients accessing/altering UIDs that may or may not get reused.
But I think this one is at least clear of the problem you mentioned
earlier.

In theory one would call it from cron like so:

30 0 * * * root /opt/cron/dovecot-maintenance.sh | logger -i -p mail.info -t 
dovecot-maintenance

Which loops over all the relevant users calling dovecot-maintenance.pl
on them in turn.  Could probably even fork off some small number to run
in parallel.

Thanks,
Brian


dovecot-maintenance.sh
Description: Bourne shell script
#!/usr/bin/perl -w
# dovecot-maintenance.pl
# 2010-11-24
# bpkroth
#
# This script performs a number of maintenance operations on dovecot mailboxes
# for a user.  It is expected to be run daily, but only operates on a subset of
# users at a time.
# 
# The tasks it performs are:
# 1) Remove messages that were marked for deletion over N days ago.
# (TODO: We could also consider removing spam or trash files here, but don't currently).
# 2) Rebuild the mailbox indexes to clean out the removed files.
# 3) Rebuild the maildirsize quota calculation to account for the removed files
# and to deal with falling out of sync problems.
# 4) Refresh the full text search index.

use strict;
use File::Basename;
use Getopt::Long;
use POSIX qw(strftime);
use Data::Dumper;
use IPC::Open2;
use Date::Parse qw(str2time);

my $PROG = basename($0);
my $DEBUG = 0;
my $DRYRUN = 0;
my $user;

my $rc = GetOptions(
	'user=s'	=> \$user,
	'verbose+'	=> \$DEBUG,
	'dryrun'	=> \$DRYRUN,
);

die(&

Re: [Dovecot] maildir maintenance?

2010-11-26 Thread Brian Kroth
Timo Sirainen  2010-11-26 15:15:
> On Fri, 2010-11-26 at 09:03 -0600, Brian Kroth wrote:
> > 
> > So, I redid this in Perl to only use IMAP rather than any sudo or find
> > calls.  In theory then one doesn't need to worry about the indexes being
> > out of sync.  I still skipped over the "strange characters" mailboxes
> > for the moment.  I'm wondering what you think of this second rendition? 
> 
> Using some IMAP parser would be much more robust than doing it via
> regexps. Then you wouldn't have to worry about strange characters
> either.

*sigh*

Should have started there.  It appears there's a nice module that even
has an example for talking to Dovecot via preauth:
http://search.cpan.org/~jettero/Net-IMAP-Simple-1.2018/Simple.pod#PREAUTH

Thanks again,
Brian

> > if ($line =~ qr/^\* (LIST|LSUB) \((\\Has(No)?Children)?\) "\/" "(.+)"\s*$/) 
> > {
> 
> Just ignore the flags in the middle, there might be others: \([^)]*\)
> 
> > if ($line =~ /^\* [0-9]+ FETCH \(UID ([0-9]+) X-SAVEDATE 
> > "([0-9]{1,2}-[A-Z][a-z][a-z]-[0-9]{4} [0-9][0-9]:[0-9][0-9]:[0-9][0-9] 
> > [0-9+-]+)" FLAGS \(([^)]+)\) ENVELOPE \((.*)\)\)\s*$/) {
> 
> ENVELOPE reply might not be in one line either. For example see what
> happens if subject has " character.
> 
> If the list of UIDs is really large, sending a command might fail
> because the command line is too long (imap_max_line_length setting).


signature.asc
Description: Digital signature