Re: Best Practice: emails and file-attachments

John Haxby Wed, 16 Aug 2006 06:44:40 -0700


Oh rats. Thunderbird ate the indenting. The two examples should be:


multipart/alternative
        text/plain
        multipart/related
                text/html
                image/gif
                image/gif
        application/msword

and

multipart/related
        text/html
        image/gif
        application/msword

the indenting indicates nesting. A message isn't just a bodypartfollowed by attachments, it has structure like a file system. Somethingwhich escapes most mail readers. Sigh.



John Haxby wrote:

lude wrote:
You also mentioned indexing each bodypart ("attachment") separately.
Why? ....
To my mind, there is no use case where it makes sense to search aparticular bodypart
I will give you the use case:

[snip]
3.) The result list would show this:
1. mail-1 'subject'
'Abstract of the message-text'
2. mail-2 'subject'
Attachment with name 'filename.doc' contains 'Abstract of
file-content'

Another Use-Case would be an extended search, which allows to select if
"attached files"
should be searched (yes or no).
That's a good use case. File it as a bug and close it WONTFIX :-) Theproblem that you have is trying to determine whether something isgoing to be inline or an attachment. I'll give you a real-life examplethat caught out some old code the other day. We had a message withthis structure:
multipart/alternative
text/plain
multipart/related
text/html
image/gif
image/gif
application/msword

Is there an attached file in there? Think before you read on.
The answer should be "no". Are you surprised that at least one clientdecided that there was? What we have is three representations of thesame document: plain text, html (with two pictures) and MS Word. Theoriginal, the Word document obviously has the best fidelity and comeslast. The one client I'm thinking of (and I've lost track of which oneit was) correctly suppressed the display of the text/plainalternative, displayed the HTML with its pictures in-line and thenmistakenly displayed the Word document as an attachment.
This is a fictional example, but it could exist:

multipart/related
text/html
image/gif
application/msword
The gif image (and let's assume it can be indexed sensibly) is"obviously" a picture in the HTML bodypart. What's the word document?It's referenced from the HTML as a link just like the picture is. Isit an attachment? What's the difference between the word documentreferenced as a link within the multipart/related (by content-id) anda link to an external document (by http URL)? From a user perspectiveboth are the same, but is one an attachment and the other not? I'mbeing unfair, this is not only an unrealistic problem but there isn'ta right or a wrong answer. The word document isn't an attachmentbecause it doesn't (or shouldn't) appear in the list of attachmentsand it's not in-line because you have to click on something to see it.
So yes, I agree, your use-cases are good; I'm just not sure how you'regoing to identify an attachment :-)
I do like the idea, though, of when you do a search for "xyzzy" thatyou get the abstract of the bodypart that contains "xyzzy" rather thanthe abstract (or subject) of the entire message and I'm going to thinkabout that one some more. The problem that immediately springs to mindthough is that a message can have an arbitrary number of bodyparts soif I have BODY-1, BODY-2, ..., BODY-N (where N is unknown) how hard isit for me to construct the search? I think I probably should constructthe search that way because the score depends upon the size of thedocument and it seems to make sense that the document is the bodypart,not the entire message, but it seems more complex than is useful formail messages.
jch

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Best Practice: emails and file-attachments

Reply via email to