One also important finding is that Lucene90DocValuesProducer is used
differently in main Solr sort and collapse sort. The main sort in Solr
queries uses two-phase comparison: ordinals first if both values are in the
same segment, and only materializes the string value via lookupOrd() if
they reside in different segments. This is not the case fort collapse
sort. During collapse with colapse sort by string field, Solr compares
documents against the current group winner via SortFieldsCompare, which
always calls copy() on the comparator for every document - regardless of
whether the document and the current group winner are from the same segment
or different segments. The same copy() call triggers lookupOrd() and LZ4
decompression, and also stores the string value as the new group winner if
the document wins the comparison. There is no ordinal-only shortcut.

Kind regards,
Bartosz Fidrysiak

On Thu, Jun 25, 2026 at 9:16 AM Bartosz Fidrysiak <[email protected]>
wrote:

> We operate a generic search tool whose primary use case involves
> collapsing and expanding documents based on user-provided keywords and
> filter queries. Since requests use Solr cursors, iterating through large
> result sets is a common pattern. We do not control how the
> user-provided keywords and filter queries narrow the dataset, which means
> many requests end up applying collapse to millions of documents. This was
> acceptable in Solr 8, but has become a serious issue after migrating to
> Solr 9, where the same queries are 2–3x slower.
>
> We have considered two potential workarounds. First, splitting the index
> into two — one for raw documents and one storing pre-collapsed results.
> Second, replacing the string sort field (document id) in collapse with a
> numeric field (hash from the string document id). Both options introduce
> breaking changes: the index split requires a significant redesign, and
> changing the collapse sort field type would invalidate all existing cursors
> stored by clients of the generic tool and require reindexing. Neither is a
> viable short-term fix.
>
> *The goal of this message is to understand what options exist in the
> current situation without committing to a large engineering effort. *
>
> We would also like to raise a broader concern: introducing mandatory LZ4
> compression for SortedDocValues term dictionaries in Lucene 9 reduced index
> size on disk, but introduced a significant performance regression for
> workloads that rely heavily on string sort fields in collapse queries.
> Whether this trade-off was intentional, and whether there is a plan to make
> compression configurable, would be valuable to know.
>
> Kind regards,
> Bartosz
>
> On Wed, Jun 24, 2026 at 10:32 PM Rob Audenaerde <[email protected]>
> wrote:
>
>> I don't know a direct answer to your questions, but some context of why
>> you are you running a collapse query on 7m documents could help provide
>> insight? What are you trying to achieve? Are the results to be paged in a
>> ui? Is it an analytics workload?
>>
>>
>>
>> On Wed, Jun 24, 2026, 21:46 Bartosz Fidrysiak <[email protected]>
>> wrote:
>>
>>> We identified a 2–3x performance regression in Solr 9.10.1 compared to
>>> Solr 8.11.2 for collapse
>>> queries that use a string field as a collapse sort field.
>>>
>>>
>>> Test setup
>>> ----------
>>>
>>> To measure the regression under real production conditions, we
>>> configured both clusters to receive identical traffic simultaneously —
>>> every Solr request is sent to both instances at the same time, making the
>>> comparison direct and unbiased. Both clusters have the same number of
>>> nodes, documents, shards, and shard ranges. The data is sharded by tenant
>>> ID, so each request is served by a single shard with no cross-shard
>>> overhead. Solr schema is the same for both clusters.
>>>
>>> We tested six query variants covering different combinations of collapse
>>> sort fields: no collapse, collapse with date sort, date+long sort,
>>> date+string sort, and string-only sort (see attachments). The results show
>>> that queries with a string field in the collapse sort are consistently and
>>> significantly slower in Solr 9, while queries using only numeric or date
>>> sort fields show no regression. Notably, the string field used in the
>>> collapse sort has very high cardinality, and the worst-case queries process
>>> millions of documents.
>>>
>>>
>>> [image: image.png]
>>> [image: image.png]
>>> [image: image.png]
>>>
>>> Root cause
>>> ----------
>>>
>>> JFR profiling of the worst-case query (sort="modified_date desc,
>>> document_id asc", ~7M documents) confirmed the root cause.
>>> [image: image.png]
>>>
>>> Lucene 9 changed the internal format for SortedDocValues
>>> (Lucene90DocValuesProducer). The term dictionary (TermsDict) now stores
>>> string values in LZ4-compressed blocks. In Lucene 8, the same data was held
>>> uncompressed in direct memory — reads were instant. In Lucene 9, every time
>>> the collapse logic needs to materialize a string value for comparison or to
>>> record a new group winner, it must decompress an LZ4 block. For ~7M
>>> documents, this decompression is triggered on nearly every document via the
>>> following call chain:
>>>
>>>   SortFieldsCompare
>>>     -> TermOrdValLeafComparator.copy()
>>>     -> lookupOrd()
>>>     -> TermsDict.decompressBlock()
>>>     -> LZ4.decompress()
>>>
>>> LZ4 decompression accounts for almost 40% of CPU time in the
>>> query-serving thread in Solr 9,
>>> versus near zero in Solr 8.
>>>
>>> Similar concerns were raised in
>>> https://github.com/apache/lucene/issues/11485
>>>
>>> Questions
>>> ---------
>>>
>>> Q1: What are your recommendations for improving the performance of
>>> collapse queries that use a string field as a sort tiebreaker in Solr 9?
>>>
>>> Q2: Is it possible to disable LZ4 compression for SortedDocValues term
>>> dictionaries — either via a configuration property or a docValuesFormat
>>> option — or is this something that could be planned for a future release?
>>>
>>> Q3: Would it be feasible to lazily materialize string field values in
>>> CollapsingQParserPlugin for group winners, so that lookupOrd() is only
>>> called when a cross-segment comparison is actually needed? This could
>>> improve performance for queries where most groups contain only one document.
>>>
>>> Kind regards,
>>> Bartosz
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>
>>

Reply via email to