[ 
https://issues.apache.org/jira/browse/SOLR-17942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18047353#comment-18047353
 ] 

Puneet Ahuja commented on SOLR-17942:
-------------------------------------

Hi [~janhoy] ,

Thanks for catching that. The issue was that my branch name had a "/" in it 
(like `puneet/branch-name`), and when `writeChangelog` used the branch name to 
create the YAML file, it created a subdirectory `/changelog/unreleased/puneet/` 
instead of placing it directly under `/changelog/unreleased/`. I'll move the 
file to the correct location and raise a PR.

I noticed the "Check changelog entry" workflow passed on GitHub, so this might 
be something the validation could catch in the future. This gotcha with branch 
names containing "/" should probably be addressed or at least mentioned in the 
documentation for the new changelog process.

CC: [~ichattopadhyaya] 

> Raising the hardcoded limit of lucene parameter ramPerThreadHardLimitMB using 
> reflection
> ----------------------------------------------------------------------------------------
>
>                 Key: SOLR-17942
>                 URL: https://issues.apache.org/jira/browse/SOLR-17942
>             Project: Solr
>          Issue Type: Task
>            Reporter: Puneet Ahuja
>            Assignee: Ishan Chattopadhyaya
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 10.0
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The parameter ramPerThreadHardLimitMB cannot be larger than 2GB in Lucene, 
> which means a single thread cannot write segments larger than 2GB.
> Refer: 
> [https://lucene.apache.org/core/9_9_0/core/org/apache/lucene/index/IndexWriterConfig.html#setRAMPerThreadHardLimitMB(int])
> This issue proposes to make this parameter configurable above the 2GB limit, 
> so that each thread can write a bigger segment. I plan to use reflection to 
> bypass this hard-coded limit in Lucene.
>  
> When indexing high dimensional vector data, each segment has its own HNSW 
> graph. So more segments mean more graphs to search per shard and more graph 
> rebuild work during merges. With this change, a single indexing thread can 
> flush fewer, and larger segments, which is generally more resource-efficient 
> for vector-heavy workloads.
> Lucene issue: https://github.com/apache/lucene/issues/15296



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to