[ 
https://issues.apache.org/jira/browse/SOLR-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Potter updated SOLR-2245:
---------------------------------

    Attachment: SOLR-2245.patch

Here's an updated patch that's close to being ready for commit. However, I've 
changed a few things in the implementation but I believe it still meets the 
spirit of Peter's original work. Mainly, this patch removes support for the 
delta-import command and instead only does full-import with support for using 
the last_index_time from the previous run as the value for the fetchMailsSince 
filter. 

The delta-import stuff is really for importing updates to existing rows and the 
MailEntityProcessor was sort of hijacking that behavior. More to the point, I 
couldn't get the DocBuilder#collectDelta code to work with the rows generated 
by the MailEntityProcessor#nextModifiedRowKey. Put simply, nextModifiedRowKey 
was returning new mails that occurred after the fetchMailsSince date filter and 
the DocBuilder was processing them like they were updates to pre-existing rows.

Thus, I felt is better to just support full-import and then have the code set 
the fetchMailsSince filter based on the last_index_time set by the DIH 
framework, which gets persisted in dataimport.properties. Of course if that 
property is not set, then the code falls back to fetchMailsSince from the 
config.

> MailEntityProcessor Update
> --------------------------
>
>                 Key: SOLR-2245
>                 URL: https://issues.apache.org/jira/browse/SOLR-2245
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4, 1.4.1
>            Reporter: Peter Sturge
>            Assignee: Timothy Potter
>            Priority: Minor
>             Fix For: 4.9, 5.0
>
>         Attachments: SOLR-2245.patch, SOLR-2245.patch, SOLR-2245.patch, 
> SOLR-2245.patch, SOLR-2245.patch, SOLR-2245.zip
>
>
> This patch addresses a number of issues in the MailEntityProcessor 
> contrib-extras module.
> The changes are outlined here:
> * Added an 'includeContent' entity attribute to allow specifying content to 
> be included independently of processing attachments
>      e.g. <entity includeContent="true" processAttachments="false" . . . /> 
> would include message content, but not attachment content
> * Added a synonym called 'processAttachments', which is synonymous to the 
> mis-spelled (and singular) 'processAttachement' property. This property 
> functions the same as processAttachement. Default= 'true' - if either is 
> false, then attachments are not processed. Note that only one of these should 
> really be specified in a given <entity> tag.
> * Added a FLAGS.NONE value, so that if an email has no flags (i.e. it is 
> unread, not deleted etc.), there is still a property value stored in the 
> 'flags' field (the value is the string "none")
> Note: there is a potential backward compat issue with FLAGS.NONE for clients 
> that expect the absence of the 'flags' field to mean 'Not read'. I'm 
> calculating this would be extremely rare, and is inadviasable in any case as 
> user flags can be arbitrarily set, so fixing it up now will ensure future 
> client access will be consistent.
> * The folder name of an email is now included as a field called 'folder' 
> (e.g. folder=INBOX.Sent). This is quite handy in search/post-indexing 
> processing
> * The addPartToDocument() method that processes attachments is significantly 
> re-written, as there looked to be no real way the existing code would ever 
> actually process attachment content and add it to the row data
> Tested on the 3.x trunk with a number of popular imap servers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to