[ 
https://issues.apache.org/jira/browse/SOLR-6973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275874#comment-14275874
 ] 

Erick Erickson commented on SOLR-6973:
--------------------------------------

First, it's usually best to start the discussion on the user's list before 
raising a JIRA, it gets more eyes and you can get faster help.

In this case, how are you routing documents? I'm pretty sure that the 
SignatureUpdateProcessorFactory is going to operate on docs _after_ they get 
routed to a shard, not before. So I suspect you're not seeing what you think 
you are, you're not actually deduplicating docs that you expect to de-duplicate 
because they're on different shards.

You have to be careful here when testing though. If the docs _happen_ to be 
routed to the same shard, then it'll all work fine.

But you haven't shown us how you configured your SignatureUpdateProcessor and 
where it is in your update chain. Details matter here.

> Some documents will not update on a cloud server using 
> SignatureUpdateProcessorFactory
> --------------------------------------------------------------------------------------
>
>                 Key: SOLR-6973
>                 URL: https://issues.apache.org/jira/browse/SOLR-6973
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 4.7
>         Environment: On a redhat 6 servers, using three solr cloud nodes
>            Reporter: Robert de Lorimier
>
> We are using solr cloud to hold recent log data for our internal auditing and 
> research. When first indexing the data, we flag the record with 
> Processed=false, and use this to search solr for new records to put into our 
> archive repository. Once the record is committed to the archive repository, 
> we update the record by setting the flag to true. As part of eliminating 
> duplicate log records we use the SignatureUpdateProcessorFactory with 
> overwriteDupes set to true to deduplication any logs that have been sent more 
> than once. This works great for 95% of the data. We are able add the records 
> to solr, lookup any records that have not been added to the archive, add 
> them, and then set the flag to true. However, for 5% of the records we are 
> not able to update the flag in the cloud configuration. When sending the 
> records that do not update using curl as a test, I do not see any error 
> associated with the non-update.
> I also set up the same cores locally without a cloud configuration and the 
> same record data does update without issue, so this seems to be a bug related 
> to cloud. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to