Hi,

I am playing around with a PDF plugin, I'd like to process the images
extracted from a pdf document through an OCR plugin.

So far I manage to extract the images from the pdf and prepare new
parts to add to the message.

    # Make a new part  
    my $part_msg = Mail::SpamAssassin::Message::Node->new({ 
normalize=>$msg->{normalize} });
    my $part_array;

    # Perpare the contents of the part
    $part_array->[0]="--$boundary";
    $part_array->[1]="Content-Type: image/tiff;";
    $part_array->[2]="Content-Transfer-Encoding: Base64";
    $part_array->[3]="";
    my $pline=4;
    my $buf;
    open(FILE, "$dir/$tifffile") or die "$!";
    while (read(FILE, $buf, 60*57)) {
        $part_array->[$pline++]=MIME::Base64::encode_base64($buf);
    }
    
    # set the part type
    $part_msg->{'type'}="image/tiff";
    debuglog($part_msg->{'type'});
    $part_msg->header("content_type", "image/tiff");
    debuglog($part_msg->header("content_type"));

    # add the part to the message
    push(@{$msg->{'parse_queue'}}, [ $part_msg, $boundary, $part_array, 1 ]);
    $msg->add_body_part($part_msg);

The part added is seem as image/tiff as I want:

    [75770] dbg: PDFassassin: image/tiff
    [75770] dbg: PDFassassin: image/tiff


This is working fine, except that the parts are seen as plain/text
from the OCR plugin:

    [75770] dbg: FuzzyOcr: part multipart/mixed
    [75770] dbg: FuzzyOcr: part text/plain
    [75770] dbg: FuzzyOcr: part application/pdf
    [75770] dbg: FuzzyOcr: part text/plain
    [75770] dbg: FuzzyOcr: part text/plain
   
The first 2 parts are in the original message, the 2 last parts are
the one added by add_body_part.


How to get the typ eto be kept in the added part?

Best regards,

Olivier

Reply via email to