Bug#432083: Fixed in grepmail versions > 5.3034

2009-08-23 Thread David Coppit

Thanks for the bug report. I've fixed it and will push out a new release
later today.



--
To UNSUBSCRIBE, email to debian-qa-packages-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#234795: Need more information

2009-08-23 Thread David Coppit

Hi there,

I need more information to debug this. Please either confirm the bug and
provide more information, or mark this bug as "not a bug".

grepmail uses Mail::Mbox::MessageParser, which is designed to use memory
proportional to the largest email message in a mailbox. I verified that it
does indeed operate this way, using a 54MB mailbox:

  mbox size: 56683943
  max email size:11182857
  max read buffer:   11184795 <-- Biggest size of M::M::MP's read buffer
  folder_reader: 11186558 <-- Biggest size of the M::M::MP Perl object

Some stats from ps(1):

  Plain text mailbox:

  min real memory:  4976640
  min virtual memory: 618000384
  max real memory: 38674432
  max virtual memory: 651546624

  Gzip compressed:

  min real memory:  5005312
  min virtual memory: 618016768
  max real memory: 38694912
  max virtual memory: 651563008

I also tried a 540MB mailbox, created by concatenating the mailbox 10
times:

  Plain text x10:

  min real memory:  4976640
  min virtual memory: 618000384
  max real memory: 40292352
  max virtual memory: 652021760

  Gzip compressed x10:

  min real memory:  5005312
  min virtual memory: 618016768
  max real memory: 40284160
  max virtual memory: 652038144

The numbers above were basically the same for a 23KB mailbox. Also note
that this command:

  perl -e 'system "ps -o rss,vsz $$"'

consumes 1175552 real and 615645184 virtual memory, so the numbers above
are not out of the ordinary.

If you could run the attached anonymize_mailbox script on your mailbox,
verify that memory usage is still bad, then send the mailbox to me, I can
debug this better.

Another idea: perhaps your mailbox is malformed, such that grepmail only
sees 1 email in the whole mailbox. You can check this by running:

  grepmail -r . my_big_mailbox

If you want to confirm that you have a very large email in your mailbox,
find this line in grepmail:

  my $email = $folder_reader->read_next_email();

and follow it with this line:

  print length($$email) . "\n";

then run something like:

  grepmail nonexistent_pattern my_big_mailbox | sort -n

Regards,
David

_____
David Coppit   http://coppit.org/



--
To UNSUBSCRIBE, email to debian-qa-packages-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#254045: -d bug: not a bug?

2009-08-23 Thread David Coppit

I believe this is not a bug. I suspect you entered a unicode character
that looks like "-" but is not. Getopt::Std fails to get options unless
the option dash is exactly Here's a program that you can use to test it.

  use Getopt::Std;
  use Data::Dumper;

  ($c) = $ARGV[0] =~ /^(.)/;
  print "Character $c is ord(" . ord($c) . ")\n";

  getopt('d',\%new_opts);
  print Dumper \%new_opts;

When I run the program, I get:

  $ perl a -d 'before 6/1/04'
  Character - is ord(45)
  $VAR1 = {
'd' => 'before 6/1/04'
  };

But when I copy and paste "-" from the website for your bug report I get:

  $ perl a ???d 'before 6/1/04'
  Character ? is ord(226)
  $VAR1 = {};

Please confirm and either provide more information or close the bug as
"not a bug".

Thanks,
David

_
David Coppit   http://coppit.org/



--
To UNSUBSCRIBE, email to debian-qa-packages-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#234795: Need more information

2009-08-23 Thread David Coppit

Forgot the anonymize_mailbox script.

On Sun, 23 Aug 2009, David Coppit wrote:


Hi there,

I need more information to debug this. Please either confirm the bug and
provide more information, or mark this bug as "not a bug".

grepmail uses Mail::Mbox::MessageParser, which is designed to use memory
proportional to the largest email message in a mailbox. I verified that it
does indeed operate this way, using a 54MB mailbox:

 mbox size: 56683943
 max email size:11182857
 max read buffer:   11184795 <-- Biggest size of M::M::MP's read buffer
 folder_reader: 11186558 <-- Biggest size of the M::M::MP Perl object

Some stats from ps(1):

 Plain text mailbox:

 min real memory:  4976640
 min virtual memory: 618000384
 max real memory: 38674432
 max virtual memory: 651546624

 Gzip compressed:

 min real memory:  5005312
 min virtual memory: 618016768
 max real memory: 38694912
 max virtual memory: 651563008

I also tried a 540MB mailbox, created by concatenating the mailbox 10
times:

 Plain text x10:

 min real memory:  4976640
 min virtual memory: 618000384
 max real memory: 40292352
 max virtual memory: 652021760

 Gzip compressed x10:

 min real memory:  5005312
 min virtual memory: 618016768
 max real memory: 40284160
 max virtual memory: 652038144

The numbers above were basically the same for a 23KB mailbox. Also note
that this command:

 perl -e 'system "ps -o rss,vsz $$"'

consumes 1175552 real and 615645184 virtual memory, so the numbers above
are not out of the ordinary.

If you could run the attached anonymize_mailbox script on your mailbox,
verify that memory usage is still bad, then send the mailbox to me, I can
debug this better.

Another idea: perhaps your mailbox is malformed, such that grepmail only
sees 1 email in the whole mailbox. You can check this by running:

 grepmail -r . my_big_mailbox

If you want to confirm that you have a very large email in your mailbox,
find this line in grepmail:

 my $email = $folder_reader->read_next_email();

and follow it with this line:

 print length($$email) . "\n";

then run something like:

 grepmail nonexistent_pattern my_big_mailbox | sort -n

Regards,
David

_____
David Coppit   http://coppit.org/



_____
David Coppit   http://coppit.org/#!/usr/bin/perl -w

$VERSION = '1.00';

use strict;
use FileHandle;

#---

my $LINE = 0;
my $FILE_HANDLE = undef;
my $START = 0;
my $END = 0;
my $READ_BUFFER = '';

sub reset_file
{
  my $file_handle = shift;

  $FILE_HANDLE = $file_handle;
  $LINE = 1;
  $START = 0;
  $END = 0;
  $READ_BUFFER = '';
}

#---

# Need this for a lookahead.
my $READ_CHUNK_SIZE = 0;

sub read_email
{
  # Undefined read buffer means we hit eof on the last read.
  return 0 unless defined $READ_BUFFER;

  my $line = $LINE;

  $START = $END;

  # Look for the start of the next email
  LOOK_FOR_NEXT_HEADER:
  while($READ_BUFFER =~ m/^(From\s.*\d:\d+:\d.* \d{4})/mg)
  {
$END = pos($READ_BUFFER) - length($1);

# Don't stop on email header for the first email in the buffer
next if $END == 0;

# Keep looking if the header we found is part of a "Begin Included
# Message".
my $end_of_string = substr($READ_BUFFER, $END-200, 200);
if ($end_of_string =~
/\n-( Begin Included Message |Original Message)-\n[^\n]*\n*$/i)
{
  next;
}

# Found the next email!
my $email = substr($READ_BUFFER, $START, $END-$START);
$LINE += ($email =~ tr/\n//);

return (1, $email, $line);
  }

  # Didn't find next email in current buffer. Most likely we need to read some
  # more of the mailbox. Shift the current email to the front of the buffer
  # unless we've already done so.
  $READ_BUFFER = substr($READ_BUFFER,$START) unless $START == 0;
  $START = 0;

  # Start looking at the end of the buffer, but back up some in case the edge
  # of the newly read buffer contains the start of a new header. I believe the
  # RFC says header lines can be at most 90 characters long.
  my $search_position = length($READ_BUFFER) - 90;
  $search_position = 0 if $search_position < 0;

  # Can't use sysread because it doesn't work with ungetc
  if ($READ_CHUNK_SIZE == 0)
  {
local $/ = undef;

if (eof $FILE_HANDLE)
{
  my $email = $READ_BUFFER;
  undef $READ_BUFFER;
  return (1, $email, $line);
}
else
{
  $READ_BUFFER = <$FILE_HANDLE>;
  pos($READ_BUFFER) = $search_position;
  goto LOOK_FOR_NEXT_HEADER;
}
  }
  else
  {
if (read($F