Problem with parsing email message with extraneous MIME information

2007-12-21 Thread Steven Allport
I am working on processing eml email message using the email module (python
2.5), on files exported from an Outlook PST file, to extract the composite
parts of the email. In most instances this works fine, the message is read
in using message_from_file, is_multipart returns True and I can process each
component and extract message attachments.

I am however running into problem with email messages that contain emails
forwarded as attachments. The email has some additional encapulated header
information from each of the forwared emails.When I processes the files
is_multipart returns False the content-type is reported as text/plain
and the payload includes all the message body from 'This message is in MIME
format' though to the end.

for example.


MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2448.0)
This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.
--_=_NextPart_000_01C43634.1A06A235
--_=_NextPart_001_01C43634.1A06A235
--_=_NextPart_001_01C43634.1A06A235
--_=_NextPart_001_01C43634.1A06A235--
--_=_NextPart_000_01C43634.1A06A235

--_=_NextPart_002_01C43634.1A06A235
--_=_NextPart_003_01C43634.1A06A235
--_=_NextPart_003_01C43634.1A06A235
--_=_NextPart_003_01C43634.1A06A235--
--_=_NextPart_002_01C43634.1A06A235
--_=_NextPart_002_01C43634.1A06A235--
--_=_NextPart_000_01C43634.1A06A235
Mime-Version: 1.0
Content-Type: multipart/mixed;
 boundary="m.182DA3C.BE6A21A3"


If I remove the section of the email from the 'This is in MIME format'
through to Mime-Version: 1.0 the message is processed correctly. (ie.
is_multipart = True , Content-Type = multipart/mixed etc.)

Could anybody tell me if the above message header breaks the conventions for
email messages or is it just some that is not handled correctly by the email
module.

I would appreciate any feedback from anyone else who has experienced such
problems or could provide hints to a reliable solution.

Thanks,
Steve 


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Problem with parsing email message with extraneous MIMEinformation

2008-01-02 Thread Steven Allport
Thanks for the response.

The section of the email is an actual message fragment. The first blank line 
that
appears in the message is immediately after the 1st

' boundary="m.182DA3C.BE6A21A3"'

There are no blank line prior to this in the message.

In the example that was snipped from an actual exported message there
is a set of 5 _NextPart_ lines followed by the message header for the 1st
attached message then a set of 7 _NextPart_ lines followed by the messge
header for the 2nd attached message. Comprising in total 6 set of _NextPart_
lines. As some of the attached messages also contained messages as
attachments.

Unfortunately it is not possible for me to post or leave the message 
anywhere
for you and so far I have been unable to recreate a test message of similar
format. I will endeavour to do so and will if I can will let you know where 
it
is.

"Gabriel Genellina" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> En Fri, 21 Dec 2007 10:22:53 -0300, Steven Allport <[EMAIL PROTECTED]>
> escribió:
>
>> I am working on processing eml email message using the email module
>> (python
>> 2.5), on files exported from an Outlook PST file, to extract the
>> composite
>> parts of the email. In most instances this works fine, the message is
>> read
>> in using message_from_file, is_multipart returns True and I can process
>> each
>> component and extract message attachments.
>>
>> I am however running into problem with email messages that contain emails
>> forwarded as attachments. The email has some additional encapulated
>> header
>> information from each of the forwared emails.When I processes the files
>> is_multipart returns False the content-type is reported as text/plain
>> and the payload includes all the message body from 'This message is in
>> MIME
>> format' though to the end.
>>
>> for example.
>>
>> 
>> MIME-Version: 1.0
>> X-Mailer: Internet Mail Service (5.5.2448.0)
>> This message is in MIME format. Since your mail reader does not
>> understand
>> this format, some or all of this message may not be legible.
>> --_=_NextPart_000_01C43634.1A06A235
>> --_=_NextPart_001_01C43634.1A06A235
>> --_=_NextPart_001_01C43634.1A06A235
>> --_=_NextPart_001_01C43634.1A06A235--
>> --_=_NextPart_000_01C43634.1A06A235
>> 
>> --_=_NextPart_002_01C43634.1A06A235
>> --_=_NextPart_003_01C43634.1A06A235
>> --_=_NextPart_003_01C43634.1A06A235
>> --_=_NextPart_003_01C43634.1A06A235--
>> --_=_NextPart_002_01C43634.1A06A235
>> --_=_NextPart_002_01C43634.1A06A235--
>> --_=_NextPart_000_01C43634.1A06A235
>> Mime-Version: 1.0
>> Content-Type: multipart/mixed;
>>  boundary="m.182DA3C.BE6A21A3"
>> 
>>
>> If I remove the section of the email from the 'This is in MIME format'
>> through to Mime-Version: 1.0 the message is processed correctly. (ie.
>> is_multipart = True , Content-Type = multipart/mixed etc.)
>
> Is this an actual message fragment? Can't be, or else it's broken. Headers
> are separated from message body by one blank line. At least there should
> be a blank line before "This message is in MIME...".
> And are actually all those xxx_NextPart_xxx lines one after the other?
>
>> Could anybody tell me if the above message header breaks the conventions
>> for
>> email messages or is it just some that is not handled correctly by the
>> email
>> module.
>
> Could you post, or better leave available somewhere, a complete message
> (as originally exported by Outlook, before any processing)?
>
> -- 
> Gabriel Genellina
>
>




-- 
http://mail.python.org/mailman/listinfo/python-list