[ 
https://issues.apache.org/jira/browse/SOLR-14245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346771#comment-17346771
 ] 

Noble Paul commented on SOLR-14245:
-----------------------------------

{quote}I strongly disagree - let's not revert, instead fix the bug
{quote}
So, you are going to fix "*this bug"* and assume you have have fixed all other 
bugs as well. What if there is another bug? Oh, don't worry, we "the 
developers" get to uncover bugs by ruining the lives of our users. Do you have 
any idea about the damage this has caused when a 5000 node cluster is totally 
down without any recourse? Do you have any idea about the effort involved in 
bringing back that cluster up without even knowing what the fix is?

Let me explain how your logic is flawed.

If there is a data validation, it should be in a place where a corrective 
action is possible. In this case, the validation (or fail fast) must be done 
where the wrong data is created. It should never be done at a place where the 
consumer of the data just fails and there is no recourse possible. 

The common practice is {{"be strict in what you produce and be lenient with 
what you consume"}} . In this case you have chosen to do the exact opposite and 
ruined the weekend of at least 5 people and caused downtime for critical 
infrastructure of a company. 
With this kind of attitude who would trust our development practice and be 
willing to upgrade to a newer version of our software? Everyone should wait for 
somebody else to pay the price.

> Validate Replica / ReplicaInfo on creation
> ------------------------------------------
>
>                 Key: SOLR-14245
>                 URL: https://issues.apache.org/jira/browse/SOLR-14245
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrCloud
>            Reporter: Andrzej Bialecki
>            Assignee: Andrzej Bialecki
>            Priority: Minor
>             Fix For: 8.5
>
>
> Replica / ReplicaInfo should be immutable and their fields should be 
> validated on creation.
> Some users reported that very rarely during a failed collection CREATE or 
> DELETE, or when the Overseer task queue becomes corrupted, Solr may write to 
> ZK incomplete replica infos (eg. node_name = null).
> This problem is difficult to reproduce but we should add safeguards anyway to 
> prevent writing such corrupted replica info to ZK.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to