[ https://issues.apache.org/jira/browse/SOLR-16753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714353#comment-17714353 ]
Chris M. Hostetter commented on SOLR-16753: ------------------------------------------- FWIW, I found that on my machine, a combination of {{stress --cpu 8}} concurrently with {{./gradlew -p solr/core beast -Dtests.failfast=true -Dtests.dups=20 --tests SplitShardWithNodeRoleTest.testSolrClusterWithNodeRoleWithPull -Dtests.seed=7DDC80A84C7DDB0E -Dtests.locale=gsw-CH -Dtests.timezone=America/Adak -Dtests.asserts=true -Dtests.file.encoding=UTF-8}} would prettyreliably cause a failure against {{main/HEAD}} (currently {{b2f7f4ddb48c7086ad639e5b263d17fd4335ec19}} ... but did not fail against {{c9d656a8aa7bc2f711d4a9007fc590faa9853fcf}} So i did some manual {{git bisect}} 'ing and AFAICT the commit that introduced the problem was in fact what I initially flagged in SOLR-16751: {noformat} 2ac7ed29563a33d9f9a31737996a1d4cfb0fca0d is the first bad commit commit 2ac7ed29563a33d9f9a31737996a1d4cfb0fca0d Author: Noble Paul <noble.p...@gmail.com> Date: Wed Apr 12 22:07:03 2023 +1000 Avoid unnecessary map creation while serializing DocCollection :040000 040000 d8b81cae08b995c464c98b9a18496c9b5f5b81b3 6d03a51dbc4f8b183059195ecc50bdea7dc1da6e M solr {noformat} Reviewing the change, again, I don't have a concrete explanation for _why_ this commit introduced the problem, but beasting the test before and after it seems pretty conclusive. My best theory: * Somewhere in the code we have a (pre-existing) situation where {{DocCollection.write(JSONWriter)}} can be called concurrently with modifying the properties of the {{DocCollection}}. * Prior to this commit, this concurrent issue was not (as much of) a problem, because the very first thing that {{DocCollection.write(JSONWriter)}} did was duplicate Map * After this commit, there is a longer window of time during the {{DocCollection.write(JSONWriter)}} call, where concurrent maniplations of the {{DocCollection.getProperties()}} will impact the data being written by the {{DocCollection.write(JSONWriter)}} call in a way that may cause inconsistencies. ..but this is purely a theory. > SplitShardWithNodeRoleTest.testSolrClusterWithNodeRoleWithPull failures > ----------------------------------------------------------------------- > > Key: SOLR-16753 > URL: https://issues.apache.org/jira/browse/SOLR-16753 > Project: Solr > Issue Type: Test > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Chris M. Hostetter > Assignee: Noble Paul > Priority: Major > Attachments: SOLR-16753.txt > > > {{SplitShardWithNodeRoleTest.testSolrClusterWithNodeRoleWithPull}} – was > added on 2023-03-13, but somwhere between 2023-04-02 and 2023-04-09 it > started failing 15-20% on jenkins jobs with seeds that don't reliably > reproduce. > At first, this seemed like it might be related to SOLR-16751, but even with > that fix failures are still happening. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org