[ 
https://issues.apache.org/jira/browse/SOLR-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Potter updated SOLR-2245:
---------------------------------

    Attachment: SOLR-2245.patch

Updated patch built against trunk that builds upon Peter's work. Specifically, 
this patch does:

1) revived unit test that uses GreenMail as an embedded imap server during unit 
testing.

2) added a new dependency on the Sun Gmail Java mail extensions, which support 
true server-side filtering; the performance gains in processing large folders 
is significant, especially for delta processing. Currently, the only 
server-side gmail filter is the after: filter but more can be added as needed.

3) I also fixed an issue with the ClassLoader and Java Activation API where 
some messages were not being processed correctly unless the Thread's context 
class loader is the one that loaded the activation classes.

> MailEntityProcessor Update
> --------------------------
>
>                 Key: SOLR-2245
>                 URL: https://issues.apache.org/jira/browse/SOLR-2245
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4, 1.4.1
>            Reporter: Peter Sturge
>            Assignee: Shalin Shekhar Mangar
>            Priority: Minor
>             Fix For: 4.8
>
>         Attachments: SOLR-2245.patch, SOLR-2245.patch, SOLR-2245.patch, 
> SOLR-2245.patch, SOLR-2245.zip
>
>
> This patch addresses a number of issues in the MailEntityProcessor 
> contrib-extras module.
> The changes are outlined here:
> * Added an 'includeContent' entity attribute to allow specifying content to 
> be included independently of processing attachments
>      e.g. <entity includeContent="true" processAttachments="false" . . . /> 
> would include message content, but not attachment content
> * Added a synonym called 'processAttachments', which is synonymous to the 
> mis-spelled (and singular) 'processAttachement' property. This property 
> functions the same as processAttachement. Default= 'true' - if either is 
> false, then attachments are not processed. Note that only one of these should 
> really be specified in a given <entity> tag.
> * Added a FLAGS.NONE value, so that if an email has no flags (i.e. it is 
> unread, not deleted etc.), there is still a property value stored in the 
> 'flags' field (the value is the string "none")
> Note: there is a potential backward compat issue with FLAGS.NONE for clients 
> that expect the absence of the 'flags' field to mean 'Not read'. I'm 
> calculating this would be extremely rare, and is inadviasable in any case as 
> user flags can be arbitrarily set, so fixing it up now will ensure future 
> client access will be consistent.
> * The folder name of an email is now included as a field called 'folder' 
> (e.g. folder=INBOX.Sent). This is quite handy in search/post-indexing 
> processing
> * The addPartToDocument() method that processes attachments is significantly 
> re-written, as there looked to be no real way the existing code would ever 
> actually process attachment content and add it to the row data
> Tested on the 3.x trunk with a number of popular imap servers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to