[ https://issues.apache.org/jira/browse/SOLR-14245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346771#comment-17346771 ]
Noble Paul commented on SOLR-14245: ----------------------------------- {quote}I strongly disagree - let's not revert, instead fix the bug {quote} So, you are going to fix "*this bug"* and assume you have have fixed all other bugs as well. What if there is another bug? Oh, don't worry, we "the developers" get to uncover bugs by ruining the lives of our users. Do you have any idea about the damage this has caused when a 5000 node cluster is totally down without any recourse? Do you have any idea about the effort involved in bringing back that cluster up without even knowing what the fix is? Let me explain how your logic is flawed. If there is a data validation, it should be in a place where a corrective action is possible. In this case, the validation (or fail fast) must be done where the wrong data is created. It should never be done at a place where the consumer of the data just fails and there is no recourse possible. The common practice is {{"be strict in what you produce and be lenient with what you consume"}} . In this case you have chosen to do the exact opposite and ruined the weekend of at least 5 people and caused downtime for critical infrastructure of a company. With this kind of attitude who would trust our development practice and be willing to upgrade to a newer version of our software? Everyone should wait for somebody else to pay the price. > Validate Replica / ReplicaInfo on creation > ------------------------------------------ > > Key: SOLR-14245 > URL: https://issues.apache.org/jira/browse/SOLR-14245 > Project: Solr > Issue Type: Improvement > Components: SolrCloud > Reporter: Andrzej Bialecki > Assignee: Andrzej Bialecki > Priority: Minor > Fix For: 8.5 > > > Replica / ReplicaInfo should be immutable and their fields should be > validated on creation. > Some users reported that very rarely during a failed collection CREATE or > DELETE, or when the Overseer task queue becomes corrupted, Solr may write to > ZK incomplete replica infos (eg. node_name = null). > This problem is difficult to reproduce but we should add safeguards anyway to > prevent writing such corrupted replica info to ZK. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org