[ 
https://issues.apache.org/jira/browse/SOLR-17942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SOLR-17942:
----------------------------------
    Labels: pull-request-available  (was: )

> Raising the hardcoded limit of lucene parameter ramPerThreadHardLimitMB using 
> reflection
> ----------------------------------------------------------------------------------------
>
>                 Key: SOLR-17942
>                 URL: https://issues.apache.org/jira/browse/SOLR-17942
>             Project: Solr
>          Issue Type: Task
>            Reporter: Puneet Ahuja
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 10.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The parameter ramPerThreadHardLimitMB cannot be larger than 2GB in Lucene, 
> which means a single thread cannot write segments larger than 2GB.
> Refer: 
> [https://lucene.apache.org/core/9_9_0/core/org/apache/lucene/index/IndexWriterConfig.html#setRAMPerThreadHardLimitMB(int])
> This issue proposes to make this parameter configurable above the 2GB limit, 
> so that each thread can write a bigger segment. I plan to use reflection to 
> bypass this hard-coded limit in Lucene.
>  
> When indexing high dimensional vector data, each segment has its own HNSW 
> graph. So more segments mean more graphs to search per shard and more graph 
> rebuild work during merges. With this change, a single indexing thread can 
> flush fewer, and larger segments, which is generally more resource-efficient 
> for vector-heavy workloads.
> Lucene issue: https://github.com/apache/lucene/issues/15296



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to