[jira] [Commented] (SOLR-7134) Replication can still cause index corruption.

Mike Drob (JIRA) Wed, 25 Feb 2015 15:10:45 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-7134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14337423#comment-14337423
 ]


Mike Drob commented on SOLR-7134:
---------------------------------

{code:title=RealTimeGetComponent.java}
+        } catch (Exception e) {
+          throw new SolrException(ErrorCode.SERVER_ERROR, null, e);
+        }
{code}
Is it ok to kill the whole operation from inside of a debug block? Maybe just 
debug log that we couldn't get correct debug logging for some reason (log 
exception too).

{code:title=HdfsTestUtil.java}
+      try {
+        dfsCluster.shutdown();
+      } catch (Error e) {
+        log.warn("Exception shutting down dfsCluster", e);
+      }
{code}
Is this related, or just incidental test cleanup?

{code:title=StopableIndexingThread.java}
-  public StopableIndexingThread(SolrClient controlClient, SolrClient 
cloudClient, String id, boolean doDeletes, int numCycles) {
{code}
No reason to remove this constructor, I think.

{code:title=ChaosMonkeySafeLeaderTest.java}
+    if (!pauseBetweenUpdates) {
+      maxUpdates = 10000 + random().nextInt(1000);
+    } else {
+      maxUpdates = 15000;
+    }
{code}
Why is there a difference?

{code:title=SnapPuller.java}
+          LOG.info("Reloading SolrCore");
{code}
Possibly worth logging _which_ core?

> Replication can still cause index corruption.
> ---------------------------------------------
>
>                 Key: SOLR-7134
>                 URL: https://issues.apache.org/jira/browse/SOLR-7134
>             Project: Solr
>          Issue Type: Bug
>          Components: replication (java)
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Critical
>             Fix For: Trunk, 5.1
>
>         Attachments: SOLR-7134.patch, SOLR-7134.patch
>
>
> While we have plugged most of these holes, there appears to be another that 
> is fairly rare.
> I've seen it play out a couple ways in tests, but it looks like part of the 
> problem is that even if we decide we need a file and download it, we don't 
> care if we then cannot move it into place if it already exists.
> I'm working with a fix that does two things:
> * Fail a replication attempt if we cannot move a file into place because it 
> already exists.
> * If a replication attempt during recovery fails, on the next attempt force a 
> full replication to a new directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-7134) Replication can still cause index corruption.

Reply via email to