Brian Kroth <bpkr...@gmail.com> 2010-11-24 13:28:
> Timo Sirainen <t...@iki.fi> 2010-11-24 19:04:
> > On 24.11.2010, at 18.59, Brian Kroth wrote:
> > 
> > >>> Hi, I'm running version 1.2.15 (so no doveadm)
> > >> You could build Dovecot v2.0 and only use doveadm binary from it.
> > > 
> > > Does it just issue the command via IMAP?  No direct filesystem
> > > operations?
> > 
> > It's all direct filesystem operations, no IMAP. But v1.2.15 can read v2.0's 
> > index files just fine.
> > 
> > >> Looks about ok. The main thing I'm worried is what happens if user 
> > >> creates mailboxes containing " or ' or ` characters.
> > > 
> > > Yeah, that was mostly me being lazy in my wanting to deal with escaping,
> > > so I just ignored them.
> > > 
> > > In what I originally wrote, I think it just won't touch them.
> > > 
> > > Or is the issue that the find command might remove them and then the
> > > indexes don't get fixed up?  I suppose I could just make sure that the
> > > find ignores those dirs, but I thought (from other maillist reading)
> > > that the next time their client SELECTs the folder it'll fix it up
> > > anyways.
> > 
> > I was more thinking what happens if the user creates a mailbox called `rm 
> > -rf /` or something.. Also if there are " or \ characters I think the LIST 
> > output will use literals and your parsing will break more or less badly.
> 
> That's certainly true.  I guess I was just hoping to skip over those
> mailboxes with unpleasant characters for the moment :}
> 
> More likely I'll rewrite this more carefully in Perl.
> 
> > > I suppose another spin on this would be for me to script the preauth
> > > imap client to figure out which mailboxes have messages marked for
> > > deletion of such and such an age and then try to use EXPUNGE to wipe
> > > just them out.  I'm not sure off hand if that's possible.
> > 
> > That would be a bit difficult at least to do via IMAP..
> 
> So I'm finding.  I guess I was thinking I could find the messages in a
> SELECTed mailbox via some parsing of either 
> - UID SEARCH DELETED X-SINCE $N_days_ago (where X-SINCE search X-SAVEDATE
>   instead of INTERNALDATE), or
> - UID FETCH 1:* (INTERNALDATE X-SAVEDATE FLAGS) as I've seen bantered about, 
> or
> - combine the two and SEARCH DELETED, then
>   UID FETCH $initial_uid_list (X-SAVEDATE FLAGS) to refine the list.
> 
> Then use the (U?)IDs I get back from that to do
> - UID EXPUNGE $uid_list
> 
> Of course I've only started researching that avenue, so maybe that's not
> so reasonable.
> 
> I'm starting to see why so much effort has been expended on this front.
> 
> Thanks,
> Brian

So, I redid this in Perl to only use IMAP rather than any sudo or find
calls.  In theory then one doesn't need to worry about the indexes being
out of sync.  I still skipped over the "strange characters" mailboxes
for the moment.  I'm wondering what you think of this second rendition?

The only thing I'm not quite sure about is if there's some sort of race
between clients accessing/altering UIDs that may or may not get reused.
But I think this one is at least clear of the problem you mentioned
earlier.

In theory one would call it from cron like so:

30 0 * * * root /opt/cron/dovecot-maintenance.sh | logger -i -p mail.info -t 
dovecot-maintenance

Which loops over all the relevant users calling dovecot-maintenance.pl
on them in turn.  Could probably even fork off some small number to run
in parallel.

Thanks,
Brian

Attachment: dovecot-maintenance.sh
Description: Bourne shell script

#!/usr/bin/perl -w
# dovecot-maintenance.pl
# 2010-11-24
# bpkroth
#
# This script performs a number of maintenance operations on dovecot mailboxes
# for a user.  It is expected to be run daily, but only operates on a subset of
# users at a time.
# 
# The tasks it performs are:
# 1) Remove messages that were marked for deletion over N days ago.
# (TODO: We could also consider removing spam or trash files here, but don't currently).
# 2) Rebuild the mailbox indexes to clean out the removed files.
# 3) Rebuild the maildirsize quota calculation to account for the removed files
# and to deal with falling out of sync problems.
# 4) Refresh the full text search index.

use strict;
use File::Basename;
use Getopt::Long;
use POSIX qw(strftime);
use Data::Dumper;
use IPC::Open2;
use Date::Parse qw(str2time);

my $PROG = basename($0);
my $DEBUG = 0;
my $DRYRUN = 0;
my $user;

my $rc = GetOptions(
	'user=s'	=> \$user,
	'verbose+'	=> \$DEBUG,
	'dryrun'	=> \$DRYRUN,
);

die("usage: $PROG [--verbose+] [--dryrun] --user=someone\n") if (!$rc || !$user);
die("ERROR: This must be run as root!") unless ($> == 0 && $< == 0);

# The number of days old that a file marked for deletion has to be before we'll remove it.
my $N = 60;
my $N_secs = $N * 3600 * 24;

# A routine to run an IMAP command and return the output.
sub run_imap_cmd($$$);
# A routine to take an array of uids and uniquify and turn them into an IMAP uid list.
sub assemble_uid_list(\@);

# Get the user info.
$DEBUG && print "Working on '$user'.\n";
my ($name, $passwd, $uid, $gid, $quota, $comment, $gcos, $dir, $shell, $expire) = getpwnam($user);
die("ERROR: Failed to fetch user info!") if (!$uid);
die("Invalid Home: $dir\n") if ($dir !~ qr|^/mail/[a-z]/|);

# FIXME: Only operate on my users for the moment.
die("Invalid User: $user\n") if ($user !~ /^(bpkroth|bkroth|bpkmail|bpktest|brian)$/);

# Switch to the user.
# This takes the place of a sudo call.
$< = $uid;
$> = $uid;
my $home = $dir;
$ENV{'HOME'} = $home;
$ENV{'USER'} = $user;

($DEBUG > 1) && print "RUID is $<.  EUID is $>.  USER is '", $ENV{'USER'}, "'. HOME is '", $ENV{'HOME'}, "'.\n";

# Open up a handle to a dovecot process as the user.
#my $pid = open2(\*IMAP_OUT, \*IMAP_IN, "sudo -H -u $user /usr/sbin/dovecot --exec-mail imap");
my $pid = open2(\*IMAP_OUT, \*IMAP_IN, "/usr/sbin/dovecot --exec-mail imap");

# Get the set of the user's mailboxes.
my @lines = run_imap_cmd(\*IMAP_OUT, \*IMAP_IN, 'LIST * *');
# This will get just their subscribed mailboxes.
#my @lines = run_imap_cmd(\*IMAP_OUT, \*IMAP_IN, 'LSUB * *');

my @mboxes;
foreach my $line (@lines) {
	if ($line =~ qr/^\* (LIST|LSUB) \((\\Has(No)?Children)?\) "\/" "(.+)"\s*$/) {
		push(@mboxes, $4);
	}
	# TODO: Handle mailboxes with goofy characters that get shoved onto the next line.
	elsif ($line =~ /^[^\*]/ && $line !~ /^a[0-9]+/) {
		print "Skipping the following mailbox for $user: $line";
	}
}
$DEBUG && print "Found the following mailboxes for $user: ", join(', ', @mboxes), "\n";

# Now, for each mailbox, do the work.
foreach my $mbox (@mboxes) {
	# Select it.
	$DEBUG && print "Selecting \"$mbox\" for $user.\n";
	@lines = run_imap_cmd(\*IMAP_OUT, \*IMAP_IN, "SELECT \"$mbox\"");
	# We don't really care about the return from this, unless it
	# didn't exist, but then our function would have died on us.
	
	# Search for UIDs that are marked as deleted.
	my @uids;
	@lines = run_imap_cmd(\*IMAP_OUT, \*IMAP_IN, 'UID SEARCH DELETED');
	foreach my $line (@lines) {
		if ($line =~ /^\* SEARCH(( [0-9]+)+)\s*$/) {
			push(@uids, split(/\s+/, $1));
		}
	}
	@uids = grep(/^[0-9]+$/, @uids);
	my $uid_list = assemble_uid_list(@uids);

	if ($uid_list) {
		# Get the date info on those uids.
		# Also fetch the ENVELOPE for debugging purposes.
		my @old_uids;
		@lines = run_imap_cmd(\*IMAP_OUT, \*IMAP_IN, "UID FETCH $uid_list (X-SAVEDATE FLAGS ENVELOPE)");
		foreach my $line (@lines) {
			# * 1 FETCH (UID 7 X-SAVEDATE "19-Nov-2010 17:12:39 -0600" FLAGS (\Deleted \Seen) ENVELOPE (/lots of stuff/))
			if ($line =~ /^\* [0-9]+ FETCH \(UID ([0-9]+) X-SAVEDATE "([0-9]{1,2}-[A-Z][a-z][a-z]-[0-9]{4} [0-9][0-9]:[0-9][0-9]:[0-9][0-9] [0-9+-]+)" FLAGS \(([^)]+)\) ENVELOPE \((.*)\)\)\s*$/) {
				my $uid = $1;
				my $savedate = $2;
				my $flags = $3;
				my $envelope = $4;	

				my $then = str2time($savedate);

				if (!grep(/^$uid$/, @uids)) {
					$DEBUG && print "WARNING: UID $uid unmatched.\n";
					next;
				}
				elsif ($flags !~ /\\Deleted/) {
					$DEBUG && print "WARNING: UID $uid unmatched flags '$flags'.\n";
					next;
				}
				elsif ((time() - $then) < $N_secs) {
					($DEBUG > 1) && print "UID $uid not old enough ($then).\n";
					next;
				}
				else {
					$DEBUG && print "Adding UID $uid with X-SAVEDATE '$savedate' and the following ENVELOPE to the EXPUNGE list: '$envelope'\n";
					push(@old_uids, $uid);
				}
			}
		}
		my $old_uid_list = assemble_uid_list(@old_uids);

		if ($old_uid_list) {
			$DEBUG && print "Expunging $old_uid_list from \"$mbox\" for $user.\n";
			if (!$DRYRUN) {
				# Finally, now we can expunge those UIDs.
				@lines = run_imap_cmd(\*IMAP_OUT, \*IMAP_IN, "UID EXPUNGE $old_uid_list");
				# We don't really care about the return from this, it'll die if it's unhappy.
			}
		}
		else {
			$DEBUG && print "No old messages to expunge from \"$mbox\" for $user.\n";
		}
	}

	# Next, rebuild the FTS for this mailbox.
	# NOTE: This is slow the very first time, and typically pretty quick after that.
	if ($mbox !~ /trash|junk|spam/i) {
		$DEBUG && print "Rebuilding FTS index of \"$mbox\" for $user.\n";
		run_imap_cmd(\*IMAP_OUT, \*IMAP_IN, 'SEARCH BODY blah');
		# We don't really care about the return from this, it'll die if it's unhappy.
		$DEBUG && print "Finished rebuilding FTS index of \"$mbox\" for $user.\n";
	}
	else {
		$DEBUG && print "Skipping rebuilding FTS index of \"$mbox\" for $user.\n";
	}
}

# Finally, rebuild the quota file for this entire account.
# NOTE: This is the bit that takes the longest since it needs to walk the tree.
if (!$DRYRUN && -f "$home/Maildir/maildirsize") {
	unlink("$home/Maildir/maildirsize");
}
$DEBUG && print "Rebuilding maildirsize for $user ... \n";
run_imap_cmd(\*IMAP_OUT, \*IMAP_IN, 'GETQUOTAROOT Inbox');
# We don't really care about the return from this, it'll die if it's unhappy.
$DEBUG && print "Finished rebuilding maildirsize for $user.\n";

# Close the IMAP connection.
@lines = run_imap_cmd(\*IMAP_OUT, \*IMAP_IN, "LOGOUT");
# We don't really care about the return from this.
waitpid($pid, 0);
$rc = $? >> 8;
($DEBUG > 1) && print "IMAP process exited with status $rc.\n";

exit 0;



# A routine to run an IMAP command and return the output.
{
my $imap_cmd_num = 0;
sub run_imap_cmd($$$) {
	my ($IMAP_OUT, $IMAP_IN, $cmd) = @_;

	# Increment the tag on our imap command.
	$imap_cmd_num++;

	# Send the command.
	print $IMAP_IN "a$imap_cmd_num $cmd\n";

	# Read the response.
	my $done = 0;
	my @imap_output;
	while (!$done && (my $line = <$IMAP_OUT>)) {
		($DEBUG > 1) && print $line;

		# Handle the return code from our command.
		if ($line =~ /^a$imap_cmd_num (\S+) /) {
			if ($1 eq 'OK') {
				$done = 1;
			}
			else {
				die("ERROR: Got a bad return code for our '$cmd' operation!", $line);
			}
		}
		else {
			push(@imap_output, $line);
		}
	}
	return @imap_output;
}
}

# A routine to take an array of uids and uniquify and turn them into an IMAP uid list.
sub assemble_uid_list(\@) {
	my ($uids) = @_;

	my $first_uid = -1;
	my $last_uid = -1;
	my $uid_list = '';
	# add -1 (a fake #), to force the last bit to the string
	my @uids = sort {$a <=> $b} @$uids;
	push(@uids, -1);
	foreach my $uid (@uids) {
		($DEBUG > 2) && print $uid, "\n";
		# Only look for unique ones.
		next if ($uid == $last_uid);
		# Extend the range by one.
		if (($uid - $last_uid) == 1) {
			$last_uid = $uid;
		}
		# Broken range, start it again.
		else {
			# First, add the last one to the string.
			if ($first_uid != -1) {
				if ($first_uid != $last_uid) {
					$uid_list .= "$first_uid:$last_uid,";
				}
				else {
					$uid_list .= "$last_uid,";
				}
			}
			# Reset our variables.
			$first_uid = $uid;
			$last_uid = $uid;
		}
	}
	$uid_list =~ s/,$//;
	($DEBUG > 1) && print "uid_list: $uid_list\n";
	return $uid_list;
}

Attachment: signature.asc
Description: Digital signature

Reply via email to