Hello Calvin, Thank you for this wonderful issue write-up!
You mention upgrading from Solr 6 to 9.5 versions and I wonder if it might be practical or insightful to assess for some versions in between too e.g. 9.5/9.4/9.3 going backwards or 6/7/8/9.0 going forward or some sort of binary search variant. Best wishes, Christine From: users@solr.apache.org At: 02/16/24 17:59:59 UTCTo: solr-u...@lucene.apache.org Subject: Partial update slowness with a stored="false" dynamic field and lots of distinct field names Hi solr users, While tracking down a severe performance regression doing partial updates when upgrading from Solr 6 to solr 9.5.0, I discovered the following unexpected behavior. In my schema.xml file, I have the following fields (among many others): <field name="id" type="string" indexed="true" stored="true" required="true"/> <field name="_version_" type="long" indexed="false" stored="false"/> <field name="name" type="text_en_splitting_tight" indexed="true" stored="true" omitNorms="true"/> <dynamicField name="playlist_index_*" type="int" .../> <!-- The int field type has docValues="true" --> The unexpected impact on partial update performance depends on whether the dynamic field above is stored (with `docValues="true"`). The index in question contains about 30 million documents and is 15GB in size, and there are a large number of distinct `playlist_index_*`` field names (more than 100K). The purpose of that dynamic field is to support sorting, with queries sometimes specifying a sort like `playlist_index_3141 asc` to sort the results by that field. For all the cases below, I added to the existing index the following 25 new documents: [ {"id": "12345678901234567891", "name": "."}, {"id": "12345678901234567892", "name": "."}, {"id": "12345678901234567893", "name": "."}, {"id": "12345678901234567894", "name": "."}, {"id": "12345678901234567895", "name": "."}, {"id": "12345678901234567896", "name": "."}, {"id": "12345678901234567897", "name": "."}, {"id": "12345678901234567898", "name": "."}, {"id": "12345678901234567899", "name": "."}, {"id": "12345678901234567900", "name": "."}, {"id": "12345678901234567901", "name": "."}, {"id": "12345678901234567902", "name": "."}, {"id": "12345678901234567903", "name": "."}, {"id": "12345678901234567904", "name": "."}, {"id": "12345678901234567995", "name": "."}, {"id": "12345678901234567996", "name": "."}, {"id": "12345678901234567997", "name": "."}, {"id": "12345678901234567998", "name": "."}, {"id": "12345678901234567999", "name": "."}, {"id": "12345678901234568000", "name": "."}, {"id": "12345678901234568001", "name": "."}, {"id": "12345678901234568002", "name": "."}, {"id": "12345678901234568003", "name": "."}, {"id": "12345678901234568004", "name": "."}, {"id": "12345678901234568005", "name": "."} ] and did both a soft and hard commit before continuing. Then I did a partial update with `commitWithin=600000` to update each of those documents: [ {"id": "12345678901234567891", "name": {"set": "1"}}, {"id": "12345678901234567892", "name": {"set": "2"}}, {"id": "12345678901234567893", "name": {"set": "3"}}, {"id": "12345678901234567894", "name": {"set": "4"}}, {"id": "12345678901234567895", "name": {"set": "5"}}, {"id": "12345678901234567896", "name": {"set": "6"}}, {"id": "12345678901234567897", "name": {"set": "7"}}, {"id": "12345678901234567898", "name": {"set": "8"}}, {"id": "12345678901234567899", "name": {"set": "9"}}, {"id": "12345678901234567900", "name": {"set": "10"}}, {"id": "12345678901234567901", "name": {"set": "11"}}, {"id": "12345678901234567902", "name": {"set": "12"}}, {"id": "12345678901234567903", "name": {"set": "13"}}, {"id": "12345678901234567904", "name": {"set": "14"}}, {"id": "12345678901234567995", "name": {"set": "15"}}, {"id": "12345678901234567996", "name": {"set": "16"}}, {"id": "12345678901234567997", "name": {"set": "17"}}, {"id": "12345678901234567998", "name": {"set": "18"}}, {"id": "12345678901234567999", "name": {"set": "19"}}, {"id": "12345678901234568000", "name": {"set": "20"}}, {"id": "12345678901234568001", "name": {"set": "21"}}, {"id": "12345678901234568002", "name": {"set": "22"}}, {"id": "12345678901234568003", "name": {"set": "23"}}, {"id": "12345678901234568004", "name": {"set": "24"}}, {"id": "12345678901234568005", "name": {"set": "25"}} ] The time it takes to perform the update varies drastically depending on whether the `playlist_index_*` dynamic field is stored: - 0.017s: <dynamicField name="playlist_index_*" type="int" indexed="true" stored="true" docValues="true"/> - 0.016s: <dynamicField name="playlist_index_*" type="int" indexed="false" stored="true" docValues="true"/> - 8.850s: <dynamicField name="playlist_index_*" type="int" indexed="false" stored="false" docValues="true"/> - 8.867s: <dynamicField name="playlist_index_*" type="int" indexed="true" stored="false" docValues="true"/> The surprise is the poor performance of the last two, when stored is false and so the partial update may need to use the docValues to generate the new doc, compared to the first two when stored is true. When I profiled the code for the last of the four settings above, I saw that the majority of the time was spent in `org.apache.solr.search.SolrDocumentFetcher.decorateDocValueFields(SolrDocumentB ase, int, Set, DocValuesIteratorCache)` and further down in the callees under that hot spot there is a call to `org.apache.solr.search.DocValuesIteratorCache.newEntry(String)` that I added some print statements to in order to see which fields `newEntry` was being called for. There were many thousands of calls to the `playlist_index_*` fields, with each call being a different field name, and no other fields. My schema does have other dynamic fields, but none of them resulted in calls to `newEntry`. What is distinct about this dynamic field is that it's the only one that is a numeric field type and the only one that has more than a few hundred or so distinct field names. Does this seem like expected behavior that solr has to make so many `newEntry` calls for that dynamic field to perform the update that it seriously impacts update performance? Thanks for your time, Calvin