janhoy opened a new pull request, #3924: URL: https://github.com/apache/solr/pull/3924
…that failed for some seeds, e.g. ``` gradle test --tests DistributedFacetSimpleRefinementLongTailTest.test -Dtests.seed=A747120FD7BE8EB6 -Dtests.locale=ne -Dtests.timezone=Africa/El_Aaiun -Dtests.asserts=true -Dtests.file.encoding=UTF-8 ``` https://issues.apache.org/jira/browse/SOLR-18012 Done in collaboration with Claude Code. Explanation: ## Root Cause Analysis and Solution What's Causing the sometimes big variance? The >90% pass rate with tight tolerance but occasional failures for some test seeds is due to multiple compounding factors: 1. T-Digest Merging Error: The percentile implementation uses AVLTreeDigest with compression=100. When results from 3 shards are merged (PercentileAgg.java:488 digest.add(subDigest)), approximation errors compound. This is inherent to the algorithm. 2. 90th Percentile: The test requests the 90th percentile (line 68: STAT_FIELD + ",90"). Extreme percentiles (near 0 or 100) have more error than median percentiles. The t-digest algorithm is optimized for accuracy around the median. 3. Random Field Selection: Line 65 randomly chooses between stat_i (single-valued) and stat_is (multivalued). Different code paths have slightly different merging characteristics. 4. Random Codec/Structure: Test parameters like maxPointsInLeafNode=1867 affect the BKD tree structure, which influences iteration order and potentially floating-point accumulation order. 5. Data Distribution: With 300 docs for aaa0 distributed across 3 shards with formulas like j*13-i, j*3+i, i*7+j, the exact centroid placements in the t-digest vary based on processing order. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
