from:"Rahul Goswami \(Jira\)"

[jira] [Created] (SOLR-16360) Atomic update on boolean fields doesn't reflect when value starts with "1", "t" or "T"

2022-08-29 Thread Rahul Goswami (Jira)

Rahul Goswami created SOLR-16360:


 Summary: Atomic update on boolean fields doesn't reflect when 
value starts with "1", "t" or "T"
 Key: SOLR-16360
 URL: https://issues.apache.org/jira/browse/SOLR-16360
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: 8.11
Reporter: Rahul Goswami


I am running Solr 8.11. As per the Solr documentation, any value starting with 
"1","t" or "T" for a boolean field is interpreted as true.
 
[https://solr.apache.org/guide/8_11/field-types-included-with-solr.html#recommended-field-types]
 
However, I hit a potential Solr bug where if the String value  "1","t" or "T"  
is passed in an atomic update, it is treated as false.
 
//Eg:Below document is indexed first => query returns "inStock" as true (as 
expected) 
{
"id":"test",
"inStock":"true"
}
 
//Follow above update with below atomic update and commit. => inStock becomes 
false in query result
{
"id":"test",
"inStock":\{"set":"1"}
}
 
This doesn't happen though if value "1" is passed in a regular update.
Eg:Below update reflects the value of inStock as true when queried.
{
"id":"test",
"inStock":"1"
}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-17038) /admin/segments handler: Expose the term count

2024-01-29 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-17038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812137#comment-17812137
 ] 

Rahul Goswami commented on SOLR-17038:
--

I am working on this.

> /admin/segments handler: Expose the term count
> --
>
> Key: SOLR-17038
> URL: https://issues.apache.org/jira/browse/SOLR-17038
> Project: Solr
>  Issue Type: Improvement
>Reporter: David Smiley
>Priority: Minor
>  Labels: newdev
>
> The term count for a field is not exposed for diagnostic purposes.  Strangely 
> enough, more obscure statistics like sumDocFreq and sumTotalTermFreq are.
> Just need to add a line like:
> {quote}fieldFlags.add("termCount", terms.size());{quote}
> to SegmentsInfoRequestHandler next to [where those other stats are 
> gathered|https://github.com/apache/solr/blob/releases/solr/9.4.1/solr/core/src/java/org/apache/solr/handler/admin/SegmentsInfoRequestHandler.java#L371-L372].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Comment Edited] (SOLR-17038) /admin/segments handler: Expose the term count

2024-01-29 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-17038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812137#comment-17812137
 ] 

Rahul Goswami edited comment on SOLR-17038 at 1/30/24 4:23 AM:
---

I am working on this. Any other stats apart from "termCount" that could be 
useful ?


was (Author: rahul196...@gmail.com):
I am working on this.

> /admin/segments handler: Expose the term count
> --
>
> Key: SOLR-17038
> URL: https://issues.apache.org/jira/browse/SOLR-17038
> Project: Solr
>  Issue Type: Improvement
>Reporter: David Smiley
>Priority: Minor
>  Labels: newdev
>
> The term count for a field is not exposed for diagnostic purposes.  Strangely 
> enough, more obscure statistics like sumDocFreq and sumTotalTermFreq are.
> Just need to add a line like:
> {quote}fieldFlags.add("termCount", terms.size());{quote}
> to SegmentsInfoRequestHandler next to [where those other stats are 
> gathered|https://github.com/apache/solr/blob/releases/solr/9.4.1/solr/core/src/java/org/apache/solr/handler/admin/SegmentsInfoRequestHandler.java#L371-L372].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Created] (SOLR-17186) Streaming query breaks if token contains backtick

2024-02-27 Thread Rahul Goswami (Jira)

Rahul Goswami created SOLR-17186:


 Summary: Streaming query breaks if token contains backtick
 Key: SOLR-17186
 URL: https://issues.apache.org/jira/browse/SOLR-17186
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: streaming expressions
Affects Versions: 8.5
Reporter: Rahul Goswami


Streaming searches break when the data contains the backtick character( ` ). 
Eg:
[http://host-name:8983/solr/MyCollection/stream?expr=search(MyCollection,q="My_Field:Foto`s",fl="field1",qt="/export")|http://pidx.idcprodcert.loc:2/solr/sharepointindex_036DE237-A69B-4E7E-929E-62C2AB7A7323_multinode/stream?expr=search(sharepointindex_036DE237-A69B-4E7E-929E-62C2AB7A7323_multinode,q=%22slevel_Url_8:Fotos%22,fl=%22contentid%22)]
 
Same search works fine if called directly with /export or /select
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-17186) Streaming query breaks if token contains backtick

2024-02-27 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-17186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17821399#comment-17821399
 ] 

Rahul Goswami commented on SOLR-17186:
--

Root cause seems to be replacement of ` with " in StreamExpressionParser 
introduced in Solr 8.5 (https://issues.apache.org/jira/browse/SOLR-14139)
[https://github.com/apache/solr/blob/main/solr/solrj-streaming/src/java/org/apache/solr/client/solrj/io/stream/expr/StreamExpressionParser.java#L138]
 
Will submit a PR.
 

> Streaming query breaks if token contains backtick
> -
>
> Key: SOLR-17186
> URL: https://issues.apache.org/jira/browse/SOLR-17186
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Affects Versions: 8.5
>Reporter: Rahul Goswami
>Priority: Major
>
> Streaming searches break when the data contains the backtick character( ` ). 
> Eg:
> [http://host-name:8983/solr/MyCollection/stream?expr=search(MyCollection,q="My_Field:Foto`s",fl="field1",qt="/export")|http://pidx.idcprodcert.loc:2/solr/sharepointindex_036DE237-A69B-4E7E-929E-62C2AB7A7323_multinode/stream?expr=search(sharepointindex_036DE237-A69B-4E7E-929E-62C2AB7A7323_multinode,q=%22slevel_Url_8:Fotos%22,fl=%22contentid%22)]
>  
> Same search works fine if called directly with /export or /select
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-16703) Clearing all documents of an index should delete traces of a previous Lucene version

2024-05-09 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-16703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845135#comment-17845135
 ] 

Rahul Goswami commented on SOLR-16703:
--

I have done some work in this area and happy to take this up. Tied up for the 
next one month, but will get to this by end of June/early July 2024. 

> Clearing all documents of an index should delete traces of a previous Lucene 
> version
> 
>
> Key: SOLR-16703
> URL: https://issues.apache.org/jira/browse/SOLR-16703
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 7.6, 8.11.2, 9.1.1
>Reporter: Gaël Jourdan
>Priority: Major
>
> _This is a ticket following a discussion on Slack with_ [~elyograg] _and_ 
> [~wunder] _especially._
> h1. High level scenario
> Assume you're starting from a current Solr server in version 7.x and want to 
> upgrade to 8.x then 9.x.
> Upgrading from 7.x to 8.x works fine. Indexes of 7.x can still be read with 
> Solr 8.x.
> On a regular basis, you clear* the index to start fresh, assuming this will 
> recreate index in version 8.x.
> This run nicely for some time. Then you want to upgrade to 9.x. When 
> starting, you get an error saying that the index is still 7.x and cannot be 
> read by 9.x.
>  
> *This is surprising because you'd expect that starting from a fresh index in 
> 8.x would have removed any trace of 7.x.*
>  
> _* : when I say "clear", I mean "delete by query \{{* : * }}all docs" and 
> then commit + optionally optimize._
> h1. What I'd like to see
> Clearing an index when running Solr version N should delete any trace of 
> Lucene version N-1.
> Otherwise this forces users to delete an index (core / collection) and 
> recreate it rather than just clearing it.
> h1. Detailed scenario to reproduce
> The following steps reproduces the issue with a standalone Solr instance 
> running in Docker but I experienced the issue in SolrCloud mode running on 
> VMs and/or bare-metal.
>  
> Also note that for personal troubleshooting I used the tool "luceneupgrader" 
> available at [https://github.com/hakanai/luceneupgrader] but it's not 
> necessary to reproduce the issue.
>  
> 1. Create a directory for data
> {code:java}
> $ mkdir solrdata
> $ chmod -R a+rwx solrdata {code}
>  
> 2. Start a Solr 7.x server, create a core and push some docs
> {code:java}
> $ docker run -d -v "$PWD/solrdata:/opt/solr/server/solr/mycores:rw" -p 
> 8983:8983 --name my_solr_7 solr:7.6.0 solr-precreate gettingstarted
> $ docker exec -it my_solr_7 post -c gettingstarted 
> example/exampledocs/manufacturers.xml
> $ curl -s 'http://localhost:8983/solr/gettingstarted/select?q=*:*' | jq 
> .response.numFound
> 11{code}
>  
> 3. Look at the index files and check version
> {code:java}
> $ ll solrdata/gettingstarted/data/index                                       
>   
> total 40K
> -rw-r--r--. 1 8983 8983  718 16 mars  17:37 _0.fdt
> -rw-r--r--. 1 8983 8983   84 16 mars  17:37 _0.fdx
> -rw-r--r--. 1 8983 8983  656 16 mars  17:37 _0.fnm
> -rw-r--r--. 1 8983 8983  112 16 mars  17:37 _0_Lucene50_0.doc
> -rw-r--r--. 1 8983 8983 1,1K 16 mars  17:37 _0_Lucene50_0.tim
> -rw-r--r--. 1 8983 8983  145 16 mars  17:37 _0_Lucene50_0.tip
> -rw-r--r--. 1 8983 8983  767 16 mars  17:37 _0_Lucene70_0.dvd
> -rw-r--r--. 1 8983 8983  730 16 mars  17:37 _0_Lucene70_0.dvm
> -rw-r--r--. 1 8983 8983  478 16 mars  17:37 _0.si
> -rw-r--r--. 1 8983 8983  203 16 mars  17:37 segments_2
> -rw-r--r--. 1 8983 8983    0 16 mars  17:36 write.lock
> $ java -jar luceneupgrader-0.6.0.jar info solrdata/gettingstarted/data/index 
> Lucene index version: 7
> {code}
>  
> 4. Stop Solr 7, update solrconfig.xml for Solr 8 and start a Solr 8 server
> {code:java}
> $ docker stop my_solr_7
> $ vim solrdata/gettingstarted/conf/solrconfig.xml
> $ cat solrdata/gettingstarted/conf/solrconfig.xml | grep luceneMatchVersion   
>   8.11.2 
> $ docker run -d -v "$PWD/solrdata:/var/solr/data:rw" -p 8983:8983 --name 
> my_solr_8 solr:8.11.2{code}
>  
> 5. Check index is loaded ok and docs are still there
> {code:java}
> $ curl -s 'http://localhost:8983/solr/gettingstarted/select?q=*:*' | jq 
> .response.numFound
> 11 {code}
>  
> 6. Clear the index and check index files / version
> {code:java}
> $ curl -X POST -H 'Content-Type: application/json' 
> 'http://localhost:8983/solr/gettingstarted/update?commit=true' -d '{ 
> "delete": {"query":"*:*"} }'
> $ ll solrdata/gettingstarted/data/index                                       
> total 4,0K
> -rw-r--r--. 1 8983 8983 135 16 mars  17:45 segments_5
> -rw-r--r--. 1 8983 8983   0 16 mars  17:36 write.lock
> $ java -jar luceneupgrader-0.6.0.jar info solrdata/gettingstarted/data/index
> Lucene index version: 7
> $ curl 'http://localhost:8983/solr/gettingstarted/update?optimize=true'
>

[jira] [Comment Edited] (SOLR-16703) Clearing all documents of an index should delete traces of a previous Lucene version

2024-05-09 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-16703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845135#comment-17845135
 ] 

Rahul Goswami edited comment on SOLR-16703 at 5/9/24 10:01 PM:
---

I have done good bit of work in this area and happy to take this up. Tied up 
for the next one month, but will get to this by end of June/early July 2024. 


was (Author: rahul196...@gmail.com):
I have done some work in this area and happy to take this up. Tied up for the 
next one month, but will get to this by end of June/early July 2024. 

> Clearing all documents of an index should delete traces of a previous Lucene 
> version
> 
>
> Key: SOLR-16703
> URL: https://issues.apache.org/jira/browse/SOLR-16703
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 7.6, 8.11.2, 9.1.1
>Reporter: Gaël Jourdan
>Priority: Major
>
> _This is a ticket following a discussion on Slack with_ [~elyograg] _and_ 
> [~wunder] _especially._
> h1. High level scenario
> Assume you're starting from a current Solr server in version 7.x and want to 
> upgrade to 8.x then 9.x.
> Upgrading from 7.x to 8.x works fine. Indexes of 7.x can still be read with 
> Solr 8.x.
> On a regular basis, you clear* the index to start fresh, assuming this will 
> recreate index in version 8.x.
> This run nicely for some time. Then you want to upgrade to 9.x. When 
> starting, you get an error saying that the index is still 7.x and cannot be 
> read by 9.x.
>  
> *This is surprising because you'd expect that starting from a fresh index in 
> 8.x would have removed any trace of 7.x.*
>  
> _* : when I say "clear", I mean "delete by query \{{* : * }}all docs" and 
> then commit + optionally optimize._
> h1. What I'd like to see
> Clearing an index when running Solr version N should delete any trace of 
> Lucene version N-1.
> Otherwise this forces users to delete an index (core / collection) and 
> recreate it rather than just clearing it.
> h1. Detailed scenario to reproduce
> The following steps reproduces the issue with a standalone Solr instance 
> running in Docker but I experienced the issue in SolrCloud mode running on 
> VMs and/or bare-metal.
>  
> Also note that for personal troubleshooting I used the tool "luceneupgrader" 
> available at [https://github.com/hakanai/luceneupgrader] but it's not 
> necessary to reproduce the issue.
>  
> 1. Create a directory for data
> {code:java}
> $ mkdir solrdata
> $ chmod -R a+rwx solrdata {code}
>  
> 2. Start a Solr 7.x server, create a core and push some docs
> {code:java}
> $ docker run -d -v "$PWD/solrdata:/opt/solr/server/solr/mycores:rw" -p 
> 8983:8983 --name my_solr_7 solr:7.6.0 solr-precreate gettingstarted
> $ docker exec -it my_solr_7 post -c gettingstarted 
> example/exampledocs/manufacturers.xml
> $ curl -s 'http://localhost:8983/solr/gettingstarted/select?q=*:*' | jq 
> .response.numFound
> 11{code}
>  
> 3. Look at the index files and check version
> {code:java}
> $ ll solrdata/gettingstarted/data/index                                       
>   
> total 40K
> -rw-r--r--. 1 8983 8983  718 16 mars  17:37 _0.fdt
> -rw-r--r--. 1 8983 8983   84 16 mars  17:37 _0.fdx
> -rw-r--r--. 1 8983 8983  656 16 mars  17:37 _0.fnm
> -rw-r--r--. 1 8983 8983  112 16 mars  17:37 _0_Lucene50_0.doc
> -rw-r--r--. 1 8983 8983 1,1K 16 mars  17:37 _0_Lucene50_0.tim
> -rw-r--r--. 1 8983 8983  145 16 mars  17:37 _0_Lucene50_0.tip
> -rw-r--r--. 1 8983 8983  767 16 mars  17:37 _0_Lucene70_0.dvd
> -rw-r--r--. 1 8983 8983  730 16 mars  17:37 _0_Lucene70_0.dvm
> -rw-r--r--. 1 8983 8983  478 16 mars  17:37 _0.si
> -rw-r--r--. 1 8983 8983  203 16 mars  17:37 segments_2
> -rw-r--r--. 1 8983 8983    0 16 mars  17:36 write.lock
> $ java -jar luceneupgrader-0.6.0.jar info solrdata/gettingstarted/data/index 
> Lucene index version: 7
> {code}
>  
> 4. Stop Solr 7, update solrconfig.xml for Solr 8 and start a Solr 8 server
> {code:java}
> $ docker stop my_solr_7
> $ vim solrdata/gettingstarted/conf/solrconfig.xml
> $ cat solrdata/gettingstarted/conf/solrconfig.xml | grep luceneMatchVersion   
>   8.11.2 
> $ docker run -d -v "$PWD/solrdata:/var/solr/data:rw" -p 8983:8983 --name 
> my_solr_8 solr:8.11.2{code}
>  
> 5. Check index is loaded ok and docs are still there
> {code:java}
> $ curl -s 'http://localhost:8983/solr/gettingstarted/select?q=*:*' | jq 
> .response.numFound
> 11 {code}
>  
> 6. Clear the index and check index files / version
> {code:java}
> $ curl -X POST -H 'Content-Type: application/json' 
> 'http://localhost:8983/solr/gettingstarted/update?commit=true' -d '{ 
> "delete": {"query":"*:*"} }'
> $ ll solrdata/gettingstarted/data/index                                       
> total 4,0K
> -rw-r--r--. 1 8983 8983 135 16 mars  17:45 seg

[jira] [Created] (SOLR-16838) Atomic updates too slow in Solr 8 vs Solr 7

2023-06-05 Thread Rahul Goswami (Jira)

Rahul Goswami created SOLR-16838:


 Summary: Atomic updates too slow in Solr 8 vs Solr 7
 Key: SOLR-16838
 URL: https://issues.apache.org/jira/browse/SOLR-16838
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SearchComponents - other
Affects Versions: 8.11.1
Reporter: Rahul Goswami


Started experiencing slowness with updates in production after upgrading from 
Solr 7.7.2 to 8.11.1. Upon comparing the performance it turns out that indexing 
20 million docs via atomic updates through the same client program (running 15 
parallel threads indexing in batches of 1000) takes below time:
 
Solr 7 : 78 mins
Solr 8:  370 mins 
 
Environment details:
- Java 11 on Windows server
- Xms1536m Xmx3072m
- Indexing client code running 15 parallel threads indexing in batches of 1000
- using SimpleFSDirectoryFactory  (since Mmap doesn't  quite work well on 
Windows for our index sizes which commonly run north of 1 TB) 
 
Looking at the thread dump, the bottleneck seems to be RealTimeGet and I can 
see that Solr 7 takes a different code path than Solr 8. Note that the 
performance of regular updates (non-atomic) is still pretty good on Solr 8 
completing in < 1 hour for the same 20 million data set. 
 
Sharing the indexing code, solrconfig, schema and thread dumps in the link 
below:
[https://drive.google.com/drive/folders/1q2DPNTYQEU6fi3NeXIKJhaoq3KPnms0h?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-16838) Atomic updates too slow in Solr 8 vs Solr 7

2023-06-05 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-16838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729551#comment-17729551
 ] 

Rahul Goswami commented on SOLR-16838:
--

I ran the test to index 5 million docs (batches of 1000 docs in 15 parallel 
threads). To eliminate the network overhead and get as accurate a benchmark as 
possible, I used an AtomicLong to measure the time around the RTG call in 
DistibutedUpdateProcessor across all calls 
([https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.7.2/solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java#L1416]).
 Did this for both Solr 7.7.2 and Solr 8.11.1 and built the solr-core.jar to 
replace it in the solr webapp lib.
 
RTG in Solr 8.x is ~10x slower. Here are the numbers (times are in 
milliseconds):
 
*+Solr 7.7.2+* : 2023-06-01 15:39:48.272 WARN  (qtp1034094674-24) [   
x:techproducts] o.a.s.u.p.LogUpdateProcessorFactory *+Total rtg time:7293486+*
*{+}Solr 8.11.1{+}:* 2023-06-01 04:46:24.758 WARN  (qtp391506011-71) [   
x:techproducts] o.a.s.u.p.LogUpdateProcessorFactory *+Total rtg time:72029877+*

> Atomic updates too slow in Solr 8 vs Solr 7
> ---
>
> Key: SOLR-16838
> URL: https://issues.apache.org/jira/browse/SOLR-16838
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 8.11.1
>Reporter: Rahul Goswami
>Priority: Major
>
> Started experiencing slowness with updates in production after upgrading from 
> Solr 7.7.2 to 8.11.1. Upon comparing the performance it turns out that 
> indexing 20 million docs via atomic updates through the same client program 
> (running 15 parallel threads indexing in batches of 1000) takes below time:
>  
> Solr 7 : 78 mins
> Solr 8:  370 mins 
>  
> Environment details:
> - Java 11 on Windows server
> - Xms1536m Xmx3072m
> - Indexing client code running 15 parallel threads indexing in batches of 1000
> - using SimpleFSDirectoryFactory  (since Mmap doesn't  quite work well on 
> Windows for our index sizes which commonly run north of 1 TB) 
>  
> Looking at the thread dump, the bottleneck seems to be RealTimeGet and I can 
> see that Solr 7 takes a different code path than Solr 8. Note that the 
> performance of regular updates (non-atomic) is still pretty good on Solr 8 
> completing in < 1 hour for the same 20 million data set. 
>  
> Sharing the indexing code, solrconfig, schema and thread dumps in the link 
> below:
> [https://drive.google.com/drive/folders/1q2DPNTYQEU6fi3NeXIKJhaoq3KPnms0h?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-16838) Atomic updates too slow in Solr 8 vs Solr 7

2023-06-05 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-16838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729558#comment-17729558
 ] 

Rahul Goswami commented on SOLR-16838:
--

Running further benchmarks reveals that the slowness is in the 
searcher.getFirstMatch() call inside getInputDocument() . 
The call eventually ends up in Lucene's SegmentTermsEnum.seekExact() which is 
where the regression seems to be.
 
*+Solr 7.7.2+*
2023-06-01 21:17:34.492 WARN  (qtp1034094674-41) [   x:techproducts] 
o.a.s.u.p.LogUpdateProcessorFactory RTG timing stats:: tlogFetchTime: 508053 ; 
*searcherFetchTime: 3229011* 

*+Solr 8+*
2023-06-01 20:43:31.767 WARN  (qtp391506011-56) [   x:techproducts] 
o.a.s.u.p.LogUpdateProcessorFactory RTG timing stats:: tlogFetchTime: 410873 ; 
*searcherFetchTime: 33296008* 

> Atomic updates too slow in Solr 8 vs Solr 7
> ---
>
> Key: SOLR-16838
> URL: https://issues.apache.org/jira/browse/SOLR-16838
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 8.11.1
>Reporter: Rahul Goswami
>Priority: Major
>
> Started experiencing slowness with updates in production after upgrading from 
> Solr 7.7.2 to 8.11.1. Upon comparing the performance it turns out that 
> indexing 20 million docs via atomic updates through the same client program 
> (running 15 parallel threads indexing in batches of 1000) takes below time:
>  
> Solr 7 : 78 mins
> Solr 8:  370 mins 
>  
> Environment details:
> - Java 11 on Windows server
> - Xms1536m Xmx3072m
> - Indexing client code running 15 parallel threads indexing in batches of 1000
> - using SimpleFSDirectoryFactory  (since Mmap doesn't  quite work well on 
> Windows for our index sizes which commonly run north of 1 TB) 
>  
> Looking at the thread dump, the bottleneck seems to be RealTimeGet and I can 
> see that Solr 7 takes a different code path than Solr 8. Note that the 
> performance of regular updates (non-atomic) is still pretty good on Solr 8 
> completing in < 1 hour for the same 20 million data set. 
>  
> Sharing the indexing code, solrconfig, schema and thread dumps in the link 
> below:
> [https://drive.google.com/drive/folders/1q2DPNTYQEU6fi3NeXIKJhaoq3KPnms0h?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Comment Edited] (SOLR-16838) Atomic updates too slow in Solr 8 vs Solr 7

2023-06-05 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-16838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729558#comment-17729558
 ] 

Rahul Goswami edited comment on SOLR-16838 at 6/6/23 3:51 AM:
--

Running further benchmarks (this time for 3 million docs) reveals that the 
slowness is in the searcher.getFirstMatch() call inside getInputDocument() . 
The call eventually ends up in Lucene's SegmentTermsEnum.seekExact() which is 
where the regression seems to be.
 
*+Solr 7.7.2+*
2023-06-01 21:17:34.492 WARN  (qtp1034094674-41) [   x:techproducts] 
o.a.s.u.p.LogUpdateProcessorFactory RTG timing stats:: tlogFetchTime: 508053 ; 
*searcherFetchTime: 3229011* 

*+Solr 8+*
2023-06-01 20:43:31.767 WARN  (qtp391506011-56) [   x:techproducts] 
o.a.s.u.p.LogUpdateProcessorFactory RTG timing stats:: tlogFetchTime: 410873 ; 
*searcherFetchTime: 33296008* 


was (Author: rahul196...@gmail.com):
Running further benchmarks reveals that the slowness is in the 
searcher.getFirstMatch() call inside getInputDocument() . 
The call eventually ends up in Lucene's SegmentTermsEnum.seekExact() which is 
where the regression seems to be.
 
*+Solr 7.7.2+*
2023-06-01 21:17:34.492 WARN  (qtp1034094674-41) [   x:techproducts] 
o.a.s.u.p.LogUpdateProcessorFactory RTG timing stats:: tlogFetchTime: 508053 ; 
*searcherFetchTime: 3229011* 

*+Solr 8+*
2023-06-01 20:43:31.767 WARN  (qtp391506011-56) [   x:techproducts] 
o.a.s.u.p.LogUpdateProcessorFactory RTG timing stats:: tlogFetchTime: 410873 ; 
*searcherFetchTime: 33296008* 

> Atomic updates too slow in Solr 8 vs Solr 7
> ---
>
> Key: SOLR-16838
> URL: https://issues.apache.org/jira/browse/SOLR-16838
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 8.11.1
>Reporter: Rahul Goswami
>Priority: Major
>
> Started experiencing slowness with updates in production after upgrading from 
> Solr 7.7.2 to 8.11.1. Upon comparing the performance it turns out that 
> indexing 20 million docs via atomic updates through the same client program 
> (running 15 parallel threads indexing in batches of 1000) takes below time:
>  
> Solr 7 : 78 mins
> Solr 8:  370 mins 
>  
> Environment details:
> - Java 11 on Windows server
> - Xms1536m Xmx3072m
> - Indexing client code running 15 parallel threads indexing in batches of 1000
> - using SimpleFSDirectoryFactory  (since Mmap doesn't  quite work well on 
> Windows for our index sizes which commonly run north of 1 TB) 
>  
> Looking at the thread dump, the bottleneck seems to be RealTimeGet and I can 
> see that Solr 7 takes a different code path than Solr 8. Note that the 
> performance of regular updates (non-atomic) is still pretty good on Solr 8 
> completing in < 1 hour for the same 20 million data set. 
>  
> Sharing the indexing code, solrconfig, schema and thread dumps in the link 
> below:
> [https://drive.google.com/drive/folders/1q2DPNTYQEU6fi3NeXIKJhaoq3KPnms0h?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-16838) Atomic updates too slow in Solr 8 vs Solr 7

2023-06-06 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-16838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729921#comment-17729921
 ] 

Rahul Goswami commented on SOLR-16838:
--

Yes, it has always been commented out. For reproducing the issue, the index has 
also been deleted multiple times and rebuilt against the same schema. Also made 
sure the dynamic field "*" doesn't exist either to eliminate the possibility of 
_{_}root_{_} being treated as a dynamic field.  

> Atomic updates too slow in Solr 8 vs Solr 7
> ---
>
> Key: SOLR-16838
> URL: https://issues.apache.org/jira/browse/SOLR-16838
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 8.11.1
>Reporter: Rahul Goswami
>Priority: Major
>  Labels: RTG, RealTimeGet, atomicupdate
>
> Started experiencing slowness with updates in production after upgrading from 
> Solr 7.7.2 to 8.11.1. Upon comparing the performance it turns out that 
> indexing 20 million docs via atomic updates through the same client program 
> (running 15 parallel threads indexing in batches of 1000) takes below time:
>  
> Solr 7 : 78 mins
> Solr 8:  370 mins 
>  
> Environment details:
> - Java 11 on Windows server
> - Xms1536m Xmx3072m
> - Indexing client code running 15 parallel threads indexing in batches of 1000
> - using SimpleFSDirectoryFactory  (since Mmap doesn't  quite work well on 
> Windows for our index sizes which commonly run north of 1 TB) 
>  
> Looking at the thread dump, the bottleneck seems to be RealTimeGet and I can 
> see that Solr 7 takes a different code path than Solr 8. Note that the 
> performance of regular updates (non-atomic) is still pretty good on Solr 8 
> completing in < 1 hour for the same 20 million data set. 
>  
> Sharing the indexing code, solrconfig, schema and thread dumps in the link 
> below:
> [https://drive.google.com/drive/folders/1q2DPNTYQEU6fi3NeXIKJhaoq3KPnms0h?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Comment Edited] (SOLR-16838) Atomic updates too slow in Solr 8 vs Solr 7

2023-06-06 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-16838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729921#comment-17729921
 ] 

Rahul Goswami edited comment on SOLR-16838 at 6/7/23 4:04 AM:
--

Yes, it has always been commented out. For reproducing the issue, the index has 
also been deleted multiple times and rebuilt against the same schema. Also made 
sure the dynamic field "*" doesn't exist either to eliminate the possibility of 
_root_ being treated as a dynamic field.  


was (Author: rahul196...@gmail.com):
Yes, it has always been commented out. For reproducing the issue, the index has 
also been deleted multiple times and rebuilt against the same schema. Also made 
sure the dynamic field "*" doesn't exist either to eliminate the possibility of 
_{_}root_{_} being treated as a dynamic field.  

> Atomic updates too slow in Solr 8 vs Solr 7
> ---
>
> Key: SOLR-16838
> URL: https://issues.apache.org/jira/browse/SOLR-16838
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 8.11.1
>Reporter: Rahul Goswami
>Priority: Major
>  Labels: RTG, RealTimeGet, atomicupdate
>
> Started experiencing slowness with updates in production after upgrading from 
> Solr 7.7.2 to 8.11.1. Upon comparing the performance it turns out that 
> indexing 20 million docs via atomic updates through the same client program 
> (running 15 parallel threads indexing in batches of 1000) takes below time:
>  
> Solr 7 : 78 mins
> Solr 8:  370 mins 
>  
> Environment details:
> - Java 11 on Windows server
> - Xms1536m Xmx3072m
> - Indexing client code running 15 parallel threads indexing in batches of 1000
> - using SimpleFSDirectoryFactory  (since Mmap doesn't  quite work well on 
> Windows for our index sizes which commonly run north of 1 TB) 
>  
> Looking at the thread dump, the bottleneck seems to be RealTimeGet and I can 
> see that Solr 7 takes a different code path than Solr 8. Note that the 
> performance of regular updates (non-atomic) is still pretty good on Solr 8 
> completing in < 1 hour for the same 20 million data set. 
>  
> Sharing the indexing code, solrconfig, schema and thread dumps in the link 
> below:
> [https://drive.google.com/drive/folders/1q2DPNTYQEU6fi3NeXIKJhaoq3KPnms0h?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Comment Edited] (SOLR-16838) Atomic updates too slow in Solr 8 vs Solr 7

2023-06-06 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-16838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729921#comment-17729921
 ] 

Rahul Goswami edited comment on SOLR-16838 at 6/7/23 4:05 AM:
--

Yes, it has always been commented out. For reproducing the issue, the index has 
also been deleted multiple times and rebuilt against the same schema. Also made 
sure the dynamic field "*" doesn't exist either to eliminate the possibility of 
_root_ being treated as a dynamic field.  


was (Author: rahul196...@gmail.com):
Yes, it has always been commented out. For reproducing the issue, the index has 
also been deleted multiple times and rebuilt against the same schema. Also made 
sure the dynamic field "*" doesn't exist either to eliminate the possibility of 
_root_ being treated as a dynamic field.  

> Atomic updates too slow in Solr 8 vs Solr 7
> ---
>
> Key: SOLR-16838
> URL: https://issues.apache.org/jira/browse/SOLR-16838
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 8.11.1
>Reporter: Rahul Goswami
>Priority: Major
>  Labels: RTG, RealTimeGet, atomicupdate
>
> Started experiencing slowness with updates in production after upgrading from 
> Solr 7.7.2 to 8.11.1. Upon comparing the performance it turns out that 
> indexing 20 million docs via atomic updates through the same client program 
> (running 15 parallel threads indexing in batches of 1000) takes below time:
>  
> Solr 7 : 78 mins
> Solr 8:  370 mins 
>  
> Environment details:
> - Java 11 on Windows server
> - Xms1536m Xmx3072m
> - Indexing client code running 15 parallel threads indexing in batches of 1000
> - using SimpleFSDirectoryFactory  (since Mmap doesn't  quite work well on 
> Windows for our index sizes which commonly run north of 1 TB) 
>  
> Looking at the thread dump, the bottleneck seems to be RealTimeGet and I can 
> see that Solr 7 takes a different code path than Solr 8. Note that the 
> performance of regular updates (non-atomic) is still pretty good on Solr 8 
> completing in < 1 hour for the same 20 million data set. 
>  
> Sharing the indexing code, solrconfig, schema and thread dumps in the link 
> below:
> [https://drive.google.com/drive/folders/1q2DPNTYQEU6fi3NeXIKJhaoq3KPnms0h?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Comment Edited] (SOLR-16838) Atomic updates too slow in Solr 8 vs Solr 7

2023-06-06 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-16838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729921#comment-17729921
 ] 

Rahul Goswami edited comment on SOLR-16838 at 6/7/23 4:05 AM:
--

Yes, it has always been commented out. For reproducing the issue, the index has 
also been deleted multiple times and rebuilt against the same schema. Also made 
sure the dynamic field "*" doesn't exist either to eliminate the possibility of 
\_root\_ being treated as a dynamic field.  


was (Author: rahul196...@gmail.com):
Yes, it has always been commented out. For reproducing the issue, the index has 
also been deleted multiple times and rebuilt against the same schema. Also made 
sure the dynamic field "*" doesn't exist either to eliminate the possibility of 
_root_ being treated as a dynamic field.  

> Atomic updates too slow in Solr 8 vs Solr 7
> ---
>
> Key: SOLR-16838
> URL: https://issues.apache.org/jira/browse/SOLR-16838
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 8.11.1
>Reporter: Rahul Goswami
>Priority: Major
>  Labels: RTG, RealTimeGet, atomicupdate
>
> Started experiencing slowness with updates in production after upgrading from 
> Solr 7.7.2 to 8.11.1. Upon comparing the performance it turns out that 
> indexing 20 million docs via atomic updates through the same client program 
> (running 15 parallel threads indexing in batches of 1000) takes below time:
>  
> Solr 7 : 78 mins
> Solr 8:  370 mins 
>  
> Environment details:
> - Java 11 on Windows server
> - Xms1536m Xmx3072m
> - Indexing client code running 15 parallel threads indexing in batches of 1000
> - using SimpleFSDirectoryFactory  (since Mmap doesn't  quite work well on 
> Windows for our index sizes which commonly run north of 1 TB) 
>  
> Looking at the thread dump, the bottleneck seems to be RealTimeGet and I can 
> see that Solr 7 takes a different code path than Solr 8. Note that the 
> performance of regular updates (non-atomic) is still pretty good on Solr 8 
> completing in < 1 hour for the same 20 million data set. 
>  
> Sharing the indexing code, solrconfig, schema and thread dumps in the link 
> below:
> [https://drive.google.com/drive/folders/1q2DPNTYQEU6fi3NeXIKJhaoq3KPnms0h?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-16838) Atomic updates too slow in Solr 8 vs Solr 7

2023-06-07 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-16838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730247#comment-17730247
 ] 

Rahul Goswami commented on SOLR-16838:
--

The regression seems to be in the Lucene layer. Quoting the discussion on this 
issue on the Lucene list:

" - 8.0 moved the terms index off-heap for non-PK fields with
MMapDirectory. [https://github.com/apache/lucene/issues/9681]
 - Then in 8.6 the FST was moved off-heap all the time.
[https://github.com/apache/lucene/issues/10297]";

 

So now the terms index is off-heap, and due to Lucene's FST reading bytes 
backwards readByte() call causes disk access for every single byte . The below 
tickets have been opened by Adrien Grand on the issue for further discussion:

[https://github.com/apache/lucene/issues/12355] and
[https://github.com/apache/lucene/issues/12356].

> Atomic updates too slow in Solr 8 vs Solr 7
> ---
>
> Key: SOLR-16838
> URL: https://issues.apache.org/jira/browse/SOLR-16838
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 8.11.1
>Reporter: Rahul Goswami
>Priority: Major
>  Labels: RTG, RealTimeGet, atomicupdate
>
> Started experiencing slowness with updates in production after upgrading from 
> Solr 7.7.2 to 8.11.1. Upon comparing the performance it turns out that 
> indexing 20 million docs via atomic updates through the same client program 
> (running 15 parallel threads indexing in batches of 1000) takes below time:
>  
> Solr 7 : 78 mins
> Solr 8:  370 mins 
>  
> Environment details:
> - Java 11 on Windows server
> - Xms1536m Xmx3072m
> - Indexing client code running 15 parallel threads indexing in batches of 1000
> - using SimpleFSDirectoryFactory  (since Mmap doesn't  quite work well on 
> Windows for our index sizes which commonly run north of 1 TB) 
>  
> Looking at the thread dump, the bottleneck seems to be RealTimeGet and I can 
> see that Solr 7 takes a different code path than Solr 8. Note that the 
> performance of regular updates (non-atomic) is still pretty good on Solr 8 
> completing in < 1 hour for the same 20 million data set. 
>  
> Sharing the indexing code, solrconfig, schema and thread dumps in the link 
> below:
> [https://drive.google.com/drive/folders/1q2DPNTYQEU6fi3NeXIKJhaoq3KPnms0h?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Comment Edited] (SOLR-16838) Atomic updates too slow in Solr 8 vs Solr 7

2023-06-07 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-16838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730247#comment-17730247
 ] 

Rahul Goswami edited comment on SOLR-16838 at 6/7/23 6:41 PM:
--

The regression seems to be in the Lucene layer. Quoting the discussion on this 
issue on the Lucene list:

" - 8.0 moved the terms index off-heap for non-PK fields with
MMapDirectory. [https://github.com/apache/lucene/issues/9681]
 - Then in 8.6 the FST was moved off-heap all the time.
[https://github.com/apache/lucene/issues/10297]";

 

So now the terms index is off-heap, and due to Lucene's FST reading bytes 
backwards readByte() call causes disk access for every 1kB of buffer. The below 
tickets have been opened by Adrien Grand on the issue for further discussion:

[https://github.com/apache/lucene/issues/12355] and
[https://github.com/apache/lucene/issues/12356].


was (Author: rahul196...@gmail.com):
The regression seems to be in the Lucene layer. Quoting the discussion on this 
issue on the Lucene list:

" - 8.0 moved the terms index off-heap for non-PK fields with
MMapDirectory. [https://github.com/apache/lucene/issues/9681]
 - Then in 8.6 the FST was moved off-heap all the time.
[https://github.com/apache/lucene/issues/10297]";

 

So now the terms index is off-heap, and due to Lucene's FST reading bytes 
backwards readByte() call causes disk access for every single byte . The below 
tickets have been opened by Adrien Grand on the issue for further discussion:

[https://github.com/apache/lucene/issues/12355] and
[https://github.com/apache/lucene/issues/12356].

> Atomic updates too slow in Solr 8 vs Solr 7
> ---
>
> Key: SOLR-16838
> URL: https://issues.apache.org/jira/browse/SOLR-16838
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 8.11.1
>Reporter: Rahul Goswami
>Priority: Major
>  Labels: RTG, RealTimeGet, atomicupdate
>
> Started experiencing slowness with updates in production after upgrading from 
> Solr 7.7.2 to 8.11.1. Upon comparing the performance it turns out that 
> indexing 20 million docs via atomic updates through the same client program 
> (running 15 parallel threads indexing in batches of 1000) takes below time:
>  
> Solr 7 : 78 mins
> Solr 8:  370 mins 
>  
> Environment details:
> - Java 11 on Windows server
> - Xms1536m Xmx3072m
> - Indexing client code running 15 parallel threads indexing in batches of 1000
> - using SimpleFSDirectoryFactory  (since Mmap doesn't  quite work well on 
> Windows for our index sizes which commonly run north of 1 TB) 
>  
> Looking at the thread dump, the bottleneck seems to be RealTimeGet and I can 
> see that Solr 7 takes a different code path than Solr 8. Note that the 
> performance of regular updates (non-atomic) is still pretty good on Solr 8 
> completing in < 1 hour for the same 20 million data set. 
>  
> Sharing the indexing code, solrconfig, schema and thread dumps in the link 
> below:
> [https://drive.google.com/drive/folders/1q2DPNTYQEU6fi3NeXIKJhaoq3KPnms0h?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Comment Edited] (SOLR-16838) Atomic updates too slow in Solr 8 vs Solr 7

2023-06-07 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-16838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730247#comment-17730247
 ] 

Rahul Goswami edited comment on SOLR-16838 at 6/7/23 6:42 PM:
--

The regression seems to be in the Lucene layer. Quoting the discussion on this 
issue on the Lucene list:

" - 8.0 moved the terms index off-heap for non-PK fields with
MMapDirectory. [https://github.com/apache/lucene/issues/9681]
 - Then in 8.6 the FST was moved off-heap all the time.
[https://github.com/apache/lucene/issues/10297]";

 

So now the terms index is off-heap, and due to Lucene's FST reading bytes 
backwards readByte() call causes disk access for every single byte read. The 
below tickets have been opened by Adrien Grand on the issue for further 
discussion:

[https://github.com/apache/lucene/issues/12355] and
[https://github.com/apache/lucene/issues/12356].


was (Author: rahul196...@gmail.com):
The regression seems to be in the Lucene layer. Quoting the discussion on this 
issue on the Lucene list:

" - 8.0 moved the terms index off-heap for non-PK fields with
MMapDirectory. [https://github.com/apache/lucene/issues/9681]
 - Then in 8.6 the FST was moved off-heap all the time.
[https://github.com/apache/lucene/issues/10297]";

 

So now the terms index is off-heap, and due to Lucene's FST reading bytes 
backwards readByte() call causes disk access for every 1kB of buffer. The below 
tickets have been opened by Adrien Grand on the issue for further discussion:

[https://github.com/apache/lucene/issues/12355] and
[https://github.com/apache/lucene/issues/12356].

> Atomic updates too slow in Solr 8 vs Solr 7
> ---
>
> Key: SOLR-16838
> URL: https://issues.apache.org/jira/browse/SOLR-16838
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 8.11.1
>Reporter: Rahul Goswami
>Priority: Major
>  Labels: RTG, RealTimeGet, atomicupdate
>
> Started experiencing slowness with updates in production after upgrading from 
> Solr 7.7.2 to 8.11.1. Upon comparing the performance it turns out that 
> indexing 20 million docs via atomic updates through the same client program 
> (running 15 parallel threads indexing in batches of 1000) takes below time:
>  
> Solr 7 : 78 mins
> Solr 8:  370 mins 
>  
> Environment details:
> - Java 11 on Windows server
> - Xms1536m Xmx3072m
> - Indexing client code running 15 parallel threads indexing in batches of 1000
> - using SimpleFSDirectoryFactory  (since Mmap doesn't  quite work well on 
> Windows for our index sizes which commonly run north of 1 TB) 
>  
> Looking at the thread dump, the bottleneck seems to be RealTimeGet and I can 
> see that Solr 7 takes a different code path than Solr 8. Note that the 
> performance of regular updates (non-atomic) is still pretty good on Solr 8 
> completing in < 1 hour for the same 20 million data set. 
>  
> Sharing the indexing code, solrconfig, schema and thread dumps in the link 
> below:
> [https://drive.google.com/drive/folders/1q2DPNTYQEU6fi3NeXIKJhaoq3KPnms0h?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Comment Edited] (SOLR-16838) Atomic updates too slow in Solr 8 vs Solr 7

2023-06-07 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-16838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730247#comment-17730247
 ] 

Rahul Goswami edited comment on SOLR-16838 at 6/7/23 9:32 PM:
--

The regression seems to be in the Lucene layer. Quoting the discussion on this 
issue on the Lucene list 
(https://lists.apache.org/thread/1fskhmz84pp60o41txsxj2193vt9txod):

" - 8.0 moved the terms index off-heap for non-PK fields with
MMapDirectory. [https://github.com/apache/lucene/issues/9681]
 - Then in 8.6 the FST was moved off-heap all the time.
[https://github.com/apache/lucene/issues/10297]";

 

So now the terms index is off-heap, and due to Lucene's FST reading bytes 
backwards readByte() call causes disk access for every single byte read. The 
below tickets have been opened by Adrien Grand on the issue for further 
discussion:

[https://github.com/apache/lucene/issues/12355] and
[https://github.com/apache/lucene/issues/12356].


was (Author: rahul196...@gmail.com):
The regression seems to be in the Lucene layer. Quoting the discussion on this 
issue on the Lucene list:

" - 8.0 moved the terms index off-heap for non-PK fields with
MMapDirectory. [https://github.com/apache/lucene/issues/9681]
 - Then in 8.6 the FST was moved off-heap all the time.
[https://github.com/apache/lucene/issues/10297]";

 

So now the terms index is off-heap, and due to Lucene's FST reading bytes 
backwards readByte() call causes disk access for every single byte read. The 
below tickets have been opened by Adrien Grand on the issue for further 
discussion:

[https://github.com/apache/lucene/issues/12355] and
[https://github.com/apache/lucene/issues/12356].

> Atomic updates too slow in Solr 8 vs Solr 7
> ---
>
> Key: SOLR-16838
> URL: https://issues.apache.org/jira/browse/SOLR-16838
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 8.11.1
>Reporter: Rahul Goswami
>Priority: Major
>  Labels: RTG, RealTimeGet, atomicupdate
>
> Started experiencing slowness with updates in production after upgrading from 
> Solr 7.7.2 to 8.11.1. Upon comparing the performance it turns out that 
> indexing 20 million docs via atomic updates through the same client program 
> (running 15 parallel threads indexing in batches of 1000) takes below time:
>  
> Solr 7 : 78 mins
> Solr 8:  370 mins 
>  
> Environment details:
> - Java 11 on Windows server
> - Xms1536m Xmx3072m
> - Indexing client code running 15 parallel threads indexing in batches of 1000
> - using SimpleFSDirectoryFactory  (since Mmap doesn't  quite work well on 
> Windows for our index sizes which commonly run north of 1 TB) 
>  
> Looking at the thread dump, the bottleneck seems to be RealTimeGet and I can 
> see that Solr 7 takes a different code path than Solr 8. Note that the 
> performance of regular updates (non-atomic) is still pretty good on Solr 8 
> completing in < 1 hour for the same 20 million data set. 
>  
> Sharing the indexing code, solrconfig, schema and thread dumps in the link 
> below:
> [https://drive.google.com/drive/folders/1q2DPNTYQEU6fi3NeXIKJhaoq3KPnms0h?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-16838) Atomic updates too slow in Solr 8 vs Solr 7

2023-06-08 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-16838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730680#comment-17730680
 ] 

Rahul Goswami commented on SOLR-16838:
--

Not sure about Mmap, but NIOFSDirectory also has similar regression.

> Atomic updates too slow in Solr 8 vs Solr 7
> ---
>
> Key: SOLR-16838
> URL: https://issues.apache.org/jira/browse/SOLR-16838
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 8.11.1
>Reporter: Rahul Goswami
>Priority: Major
>  Labels: RTG, RealTimeGet, atomicupdate
>
> Started experiencing slowness with updates in production after upgrading from 
> Solr 7.7.2 to 8.11.1. Upon comparing the performance it turns out that 
> indexing 20 million docs via atomic updates through the same client program 
> (running 15 parallel threads indexing in batches of 1000) takes below time:
>  
> Solr 7 : 78 mins
> Solr 8:  370 mins 
>  
> Environment details:
> - Java 11 on Windows server
> - Xms1536m Xmx3072m
> - Indexing client code running 15 parallel threads indexing in batches of 1000
> - using SimpleFSDirectoryFactory  (since Mmap doesn't  quite work well on 
> Windows for our index sizes which commonly run north of 1 TB) 
>  
> Looking at the thread dump, the bottleneck seems to be RealTimeGet and I can 
> see that Solr 7 takes a different code path than Solr 8. Note that the 
> performance of regular updates (non-atomic) is still pretty good on Solr 8 
> completing in < 1 hour for the same 20 million data set. 
>  
> Sharing the indexing code, solrconfig, schema and thread dumps in the link 
> below:
> [https://drive.google.com/drive/folders/1q2DPNTYQEU6fi3NeXIKJhaoq3KPnms0h?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-16838) Atomic updates too slow in Solr 8 vs Solr 7

2023-07-12 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-16838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17742498#comment-17742498
 ] 

Rahul Goswami commented on SOLR-16838:
--

Missed the last couple of comments, sorry! [~janhoy]  I backported the Lucene 
fix to 8.11.1 which is the version I have been testing on and found a dramatic 
improvement in performance. For a 20 million dataset, indexing in 15 parallel 
threads in batches of 1000, here are the before and after fix times:

Before fix: 370 mins

After fix: 65 mins

 

Note that this performance on an average is still tad slower than 7.7.2  across 
multiple runs, but I guess that can be attributed to the fact that the terms 
index is no longer loaded on-heap as of Lucene 8.6 
(https://github.com/apache/lucene/issues/10297). 

> Atomic updates too slow in Solr 8 vs Solr 7
> ---
>
> Key: SOLR-16838
> URL: https://issues.apache.org/jira/browse/SOLR-16838
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 8.11.1
>Reporter: Rahul Goswami
>Priority: Major
>  Labels: RTG, RealTimeGet, atomicupdate
>
> Started experiencing slowness with updates in production after upgrading from 
> Solr 7.7.2 to 8.11.1. Upon comparing the performance it turns out that 
> indexing 20 million docs via atomic updates through the same client program 
> (running 15 parallel threads indexing in batches of 1000) takes below time:
>  
> Solr 7 : 78 mins
> Solr 8:  370 mins 
>  
> Environment details:
> - Java 11 on Windows server
> - Xms1536m Xmx3072m
> - Indexing client code running 15 parallel threads indexing in batches of 1000
> - using SimpleFSDirectoryFactory  (since Mmap doesn't  quite work well on 
> Windows for our index sizes which commonly run north of 1 TB) 
>  
> Looking at the thread dump, the bottleneck seems to be RealTimeGet and I can 
> see that Solr 7 takes a different code path than Solr 8. Note that the 
> performance of regular updates (non-atomic) is still pretty good on Solr 8 
> completing in < 1 hour for the same 20 million data set. 
>  
> Sharing the indexing code, solrconfig, schema and thread dumps in the link 
> below:
> [https://drive.google.com/drive/folders/1q2DPNTYQEU6fi3NeXIKJhaoq3KPnms0h?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-16838) Atomic updates too slow in Solr 8 vs Solr 7

2023-07-12 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-16838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17742504#comment-17742504
 ] 

Rahul Goswami commented on SOLR-16838:
--

[~elyograg] In some scenarios, the problem with Mmap becomes more operational 
than technical. For a deployment in a customer setting, the customer hits cost 
concerns with providing enough RAM for MMap (on multiple nodes) to work 
effectively. With SimpleFS/NIOFs with sufficient optimizations, we are able to 
run multiple TB indexes effectively on a 64 GB box with 31 GB heap. Even though 
I agree that MMap works more efficiently on Linux than Windows, it would still 
not work efficiently under similar memory constraints.    

> Atomic updates too slow in Solr 8 vs Solr 7
> ---
>
> Key: SOLR-16838
> URL: https://issues.apache.org/jira/browse/SOLR-16838
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 8.11.1
>Reporter: Rahul Goswami
>Priority: Major
>  Labels: RTG, RealTimeGet, atomicupdate
>
> Started experiencing slowness with updates in production after upgrading from 
> Solr 7.7.2 to 8.11.1. Upon comparing the performance it turns out that 
> indexing 20 million docs via atomic updates through the same client program 
> (running 15 parallel threads indexing in batches of 1000) takes below time:
>  
> Solr 7 : 78 mins
> Solr 8:  370 mins 
>  
> Environment details:
> - Java 11 on Windows server
> - Xms1536m Xmx3072m
> - Indexing client code running 15 parallel threads indexing in batches of 1000
> - using SimpleFSDirectoryFactory  (since Mmap doesn't  quite work well on 
> Windows for our index sizes which commonly run north of 1 TB) 
>  
> Looking at the thread dump, the bottleneck seems to be RealTimeGet and I can 
> see that Solr 7 takes a different code path than Solr 8. Note that the 
> performance of regular updates (non-atomic) is still pretty good on Solr 8 
> completing in < 1 hour for the same 20 million data set. 
>  
> Sharing the indexing code, solrconfig, schema and thread dumps in the link 
> below:
> [https://drive.google.com/drive/folders/1q2DPNTYQEU6fi3NeXIKJhaoq3KPnms0h?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-16360) Atomic update on boolean fields doesn't reflect when value starts with "1", "t" or "T"

2023-08-02 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-16360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17750535#comment-17750535
 ] 

Rahul Goswami commented on SOLR-16360:
--

RCA: during a regular (non-atomic) update, the toInternal() method gets called 
to interpret the value and hence the documented behavior is observed. However 
during atomic update, the toNativeType() method gets called which doesn't check 
for the first character of value, thereby breaking the behavior.

> Atomic update on boolean fields doesn't reflect when value starts with "1", 
> "t" or "T"
> --
>
> Key: SOLR-16360
> URL: https://issues.apache.org/jira/browse/SOLR-16360
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 8.11
>Reporter: Rahul Goswami
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I am running Solr 8.11. As per the Solr documentation, any value starting 
> with "1","t" or "T" for a boolean field is interpreted as true.
>  
> [https://solr.apache.org/guide/8_11/field-types-included-with-solr.html#recommended-field-types]
>  
> However, I hit a potential Solr bug where if the String value  "1","t" or "T" 
>  is passed in an atomic update, it is treated as false.
>  
> //Eg:Below document is indexed first => query returns "inStock" as true (as 
> expected) 
> {
> "id":"test",
> "inStock":"true"
> }
>  
> //Follow above update with below atomic update and commit. => inStock becomes 
> false in query result
> {
> "id":"test",
> "inStock":\{"set":"1"}
> }
>  
> This doesn't happen though if value "1" is passed in a regular update.
> Eg:Below update reflects the value of inStock as true when queried.
> {
> "id":"test",
> "inStock":"1"
> }



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-17359) Make SolrCLI handle arg parsing of zk sub commands

2024-08-14 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-17359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873592#comment-17873592
 ] 

Rahul Goswami commented on SOLR-17359:
--

Thanks for your work on this Eric. This was a tedious effort! 

> Make SolrCLI handle arg parsing of zk sub commands
> --
>
> Key: SOLR-17359
> URL: https://issues.apache.org/jira/browse/SOLR-17359
> Project: Solr
>  Issue Type: Sub-task
>  Components: scripts and tools
>Reporter: Jan Høydahl
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.7
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Both  bin/solr and bin/solr.cmd have lots of shell code to parse the zk sub 
> commands, and to print the usage text. We have both a short zk uage text and 
> the full one.
> {code:java}
> Usage: solr zk upconfig|downconfig -d  -n  [-z zkHost] 
> [-s solrUrl]"
>solr zk cp [-r]   [-z zkHost] [-s solrUrl]"
>solr zk rm [-r]  [-z zkHost] [-s solrUrl]"
>solr zk mv   [-z zkHost] [-s solrUrl]"
>solr zk ls [-r]  [-z zkHost] [-s solrUrl]"
>solr zk mkroot  [-z zkHost] [-s solrUrl]"
>solr zk linkconfig --conf-name  -c  [-z zkHost] 
> [-s solrUrl]"
>solr zk updateacls  [-z zkHost] [-s solrUrl]" {code}
> Extend SolrCLI and tools API to handle sub commands more natively so that 
> doing {{solr zk -h}} shows a list of sub commands, while `solr zk cp -h` 
> shows usage for that sub command.
> I think commons-cli does not have native subcommand support like e.g. 
> picocli, but it should be possible to implement..



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-7962) Passing additional arguments to solr.cmd using "-a" does not work on Windows

2024-12-14 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17905719#comment-17905719
 ] 

Rahul Goswami commented on SOLR-7962:
-

I’ll try this and report back

> Passing additional arguments to solr.cmd using "-a" does not work on Windows
> 
>
> Key: SOLR-7962
> URL: https://issues.apache.org/jira/browse/SOLR-7962
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.3
>Reporter: Dawid Weiss
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Updated] (SOLR-17725) Automatically upgrade Solr indexes without needing to reindex from source

2025-04-04 Thread Rahul Goswami (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-17725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Goswami updated SOLR-17725:
-
Description: 
Today upgrading from Solr version X to X+2 requires complete reingestion of 
data from source. This comes from Lucene's constraint which only guarantees 
index compatibility between the version the index was created in and the 
immediate next version. 

This reindexing usually comes with added downtime and/or cost. Especially in 
case of deployments which are in customer environments and not completely in 
control of the vendor, this proposition of having to completely reindex the 
data can become a hard sell.

We at Commvault have developed a way which achieves this reindexing in-place on 
the same index. Also, the process automatically keeps "upgrading" the indexes 
over multiple subsequent Solr upgrades without needing manual intervention. 

It comes with the following limitations:
i) All _source_ fields need to be either stored=true or docValues=true. Any 
copyField destination fields can be stored=false of course, just that the 
source fields (or more precisely, the source fields you care about preserving) 
should be either stored or docValues true. 
ii) The datatype of an existing field in schema.xml shouldn't change upon Solr 
upgrade. Introducing new fields is fine. 

For indexes where this limitation is not a problem (it wasn't for us!), the 
tool can reindex in-place on the same core with zero downtime and legitimately 
"upgrade" the index. This can remove a lot of operational headaches, especially 
in environments with hundreds/thousands of very large indexes.

  was:
Today upgrading from Solr version X to X+2 requires complete reingestion of 
data from source. This comes from Lucene's constraint which only guarantees 
index compatibility between the version the index was created in and the 
immediate next version. 

This reindexing usually comes with added downtime and/or cost. Especially in 
case of deployments which are in customer environments and not completely in 
control of the vendor, this proposition of having to completely reindex the 
data can become a hard sell.

I have developed a way which achieves this reindexing in-place on the same 
index. Also, the process automatically keeps "upgrading" the indexes over 
multiple subsequent Solr upgrades without needing manual intervention. 

It comes with the following limitations:
i) All _source_ fields need to be either stored=true or docValues=true. Any 
copyField destination fields can be stored=false of course, just that the 
source fields (or more precisely, the source fields you care about preserving) 
should be either stored or docValues true. 
ii) The datatype of an existing field in schema.xml shouldn't change upon Solr 
upgrade. Introducing new fields is fine. 

For indexes where this limitation is not a problem (it wasn't for us!), the 
tool can reindex in-place on the same core with zero downtime and legitimately 
"upgrade" the index. This can remove a lot of operational headaches, especially 
in environments with hundreds/thousands of very large indexes.


> Automatically upgrade Solr indexes without needing to reindex from source
> -
>
> Key: SOLR-17725
> URL: https://issues.apache.org/jira/browse/SOLR-17725
> Project: Solr
>  Issue Type: Improvement
>Reporter: Rahul Goswami
>Priority: Major
>
> Today upgrading from Solr version X to X+2 requires complete reingestion of 
> data from source. This comes from Lucene's constraint which only guarantees 
> index compatibility between the version the index was created in and the 
> immediate next version. 
> This reindexing usually comes with added downtime and/or cost. Especially in 
> case of deployments which are in customer environments and not completely in 
> control of the vendor, this proposition of having to completely reindex the 
> data can become a hard sell.
> We at Commvault have developed a way which achieves this reindexing in-place 
> on the same index. Also, the process automatically keeps "upgrading" the 
> indexes over multiple subsequent Solr upgrades without needing manual 
> intervention. 
> It comes with the following limitations:
> i) All _source_ fields need to be either stored=true or docValues=true. Any 
> copyField destination fields can be stored=false of course, just that the 
> source fields (or more precisely, the source fields you care about 
> preserving) should be either stored or docValues true. 
> ii) The datatype of an existing field in schema.xml shouldn't change upon 
> Solr upgrade. Introducing new fields is fine. 
> For indexes where this limitation is not a problem (it wasn't for us!), the 
> tool can reindex in-place on the same core with zero downtime and 
> legitimately "upgrade" th

[jira] [Updated] (SOLR-17725) Automatically upgrade Solr indexes without needing to reindex from source

2025-04-05 Thread Rahul Goswami (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-17725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Goswami updated SOLR-17725:
-
Attachment: High Level Design.png

> Automatically upgrade Solr indexes without needing to reindex from source
> -
>
> Key: SOLR-17725
> URL: https://issues.apache.org/jira/browse/SOLR-17725
> Project: Solr
>  Issue Type: Improvement
>Reporter: Rahul Goswami
>Priority: Major
> Attachments: High Level Design.png
>
>
> Today upgrading from Solr version X to X+2 requires complete reingestion of 
> data from source. This comes from Lucene's constraint which only guarantees 
> index compatibility between the version the index was created in and the 
> immediate next version. 
> This reindexing usually comes with added downtime and/or cost. Especially in 
> case of deployments which are in customer environments and not completely in 
> control of the vendor, this proposition of having to completely reindex the 
> data can become a hard sell.
> I, on behalf of my employer, Commvault, have developed a way which achieves 
> this reindexing in-place on the same index. Also, the process automatically 
> keeps "upgrading" the indexes over multiple subsequent Solr upgrades without 
> needing manual intervention. 
> It comes with the following limitations:
> i) All _source_ fields need to be either stored=true or docValues=true. Any 
> copyField destination fields can be stored=false of course, just that the 
> source fields (or more precisely, the source fields you care about 
> preserving) should be either stored or docValues true. 
> ii) The datatype of an existing field in schema.xml shouldn't change upon 
> Solr upgrade. Introducing new fields is fine. 
> For indexes where this limitation is not a problem (it wasn't for us!), the 
> tool can reindex in-place on the same core with zero downtime and 
> legitimately "upgrade" the index. This can remove a lot of operational 
> headaches, especially in environments with hundreds/thousands of very large 
> indexes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-17725) Automatically upgrade Solr indexes without needing to reindex from source

2025-04-07 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-17725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17941691#comment-17941691
 ] 

Rahul Goswami commented on SOLR-17725:
--

[~ab] For those running SolrCloud AND having enough capacity in terms of 
infrastructure and budget, the REINDEXCOLLECTION command is a good option. I 
see that it reindexes onto a parallel collection. So for clusters with 
hundreds/thousands of large indexes, that cost can be substantial. Also the 
source collection is put in read-only mode while the reindexing happens. So can 
be a point of contention in case of environments which are more update heavy 
than search heavy (for eg: for us at Commvault). 

By means of this Jira I am attempting to overcome the Lucene limitation which 
forces you to reindex from source, when you really don't HAVE to. At least I 
would like to offer that option to users who are more cost sensitive or 
operationally sensitive (eg: Solutions which package Solr as part of the 
application and are installed/deployed on customer sites. It can be awkward to 
reason with customers as to why a solution upgrade may need a downtime if it 
involves a Solr upgrade).

The proposed solution reindexes into the same core, can be easily adapted to 
work with both standalone Solr and SolrCloud, and allows both updates and 
searches to be served while doing so. This also helps remove additional 
operational overhead since now users can focus on just the Solr upgrade without 
having to worry about index compatibility.   

 

> Automatically upgrade Solr indexes without needing to reindex from source
> -
>
> Key: SOLR-17725
> URL: https://issues.apache.org/jira/browse/SOLR-17725
> Project: Solr
>  Issue Type: Improvement
>Reporter: Rahul Goswami
>Priority: Major
> Attachments: High Level Design.png
>
>
> Today upgrading from Solr version X to X+2 requires complete reingestion of 
> data from source. This comes from Lucene's constraint which only guarantees 
> index compatibility between the version the index was created in and the 
> immediate next version. 
> This reindexing usually comes with added downtime and/or cost. Especially in 
> case of deployments which are in customer environments and not completely in 
> control of the vendor, this proposition of having to completely reindex the 
> data can become a hard sell.
> I, on behalf of my employer, Commvault, have developed a way which achieves 
> this reindexing in-place on the same index. Also, the process automatically 
> keeps "upgrading" the indexes over multiple subsequent Solr upgrades without 
> needing manual intervention. 
> It comes with the following limitations:
> i) All _source_ fields need to be either stored=true or docValues=true. Any 
> copyField destination fields can be stored=false of course, just that the 
> source fields (or more precisely, the source fields you care about 
> preserving) should be either stored or docValues true. 
> ii) The datatype of an existing field in schema.xml shouldn't change upon 
> Solr upgrade. Introducing new fields is fine. 
> For indexes where this limitation is not a problem (it wasn't for us!), the 
> tool can reindex in-place on the same core with zero downtime and 
> legitimately "upgrade" the index. This can remove a lot of operational 
> headaches, especially in environments with hundreds/thousands of very large 
> indexes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Created] (SOLR-17725) Automatically upgrade Solr indexes without needing to reindex from source

2025-04-01 Thread Rahul Goswami (Jira)

Rahul Goswami created SOLR-17725:


 Summary: Automatically upgrade Solr indexes without needing to 
reindex from source
 Key: SOLR-17725
 URL: https://issues.apache.org/jira/browse/SOLR-17725
 Project: Solr
  Issue Type: Improvement
Reporter: Rahul Goswami


Today upgrading from Solr version X to X+2 requires complete reingestion of 
data from source. This comes from Lucene's constraint which only guarantees 
index compatibility between the version the index was created in and the 
immediate next version. 

This reindexing usually comes with added downtime and/or cost. Especially in 
case of deployments which are in customer environments and not completely in 
control of the vendor, this proposition of having to completely reindex the 
data can become a hard sell.

I have developed a way which achieves this reindexing in-place on the same 
index. Also, the process automatically keeps "upgrading" the indexes over 
multiple subsequent Solr upgrades without needing manual intervention. 

It comes with the following limitations:
i) All _source_ fields need to be either stored=true or docValues=true. Any 
copyField destination fields can be stored=false of course, just that the 
source fields (or more precisely, the source fields you care about preserving) 
should be either stored or docValues true. 
ii) The datatype of an existing field in schema.xml shouldn't change upon Solr 
upgrade. Introducing new fields is fine. 

For indexes where this limitation is not a problem (it wasn't for us!), the 
tool can reindex in-place on the same core with zero downtime and legitimately 
"upgrade" the index. This can remove a lot of operational headaches, especially 
in environments with hundreds/thousands of very large indexes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Updated] (SOLR-17725) Automatically upgrade Solr indexes without needing to reindex from source

2025-04-01 Thread Rahul Goswami (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-17725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Goswami updated SOLR-17725:
-
Description: 
Today upgrading from Solr version X to X+2 requires complete reingestion of 
data from source. This comes from Lucene's constraint which only guarantees 
index compatibility between the version the index was created in and the 
immediate next version. 

This reindexing usually comes with added downtime and/or cost. Especially in 
case of deployments which are in customer environments and not completely in 
control of the vendor, this proposition of having to completely reindex the 
data can become a hard sell.

I, on behalf of my employer, Commvault, have developed a way which achieves 
this reindexing in-place on the same index. Also, the process automatically 
keeps "upgrading" the indexes over multiple subsequent Solr upgrades without 
needing manual intervention. 

It comes with the following limitations:
i) All _source_ fields need to be either stored=true or docValues=true. Any 
copyField destination fields can be stored=false of course, just that the 
source fields (or more precisely, the source fields you care about preserving) 
should be either stored or docValues true. 
ii) The datatype of an existing field in schema.xml shouldn't change upon Solr 
upgrade. Introducing new fields is fine. 

For indexes where this limitation is not a problem (it wasn't for us!), the 
tool can reindex in-place on the same core with zero downtime and legitimately 
"upgrade" the index. This can remove a lot of operational headaches, especially 
in environments with hundreds/thousands of very large indexes.

  was:
Today upgrading from Solr version X to X+2 requires complete reingestion of 
data from source. This comes from Lucene's constraint which only guarantees 
index compatibility between the version the index was created in and the 
immediate next version. 

This reindexing usually comes with added downtime and/or cost. Especially in 
case of deployments which are in customer environments and not completely in 
control of the vendor, this proposition of having to completely reindex the 
data can become a hard sell.

We at Commvault have developed a way which achieves this reindexing in-place on 
the same index. Also, the process automatically keeps "upgrading" the indexes 
over multiple subsequent Solr upgrades without needing manual intervention. 

It comes with the following limitations:
i) All _source_ fields need to be either stored=true or docValues=true. Any 
copyField destination fields can be stored=false of course, just that the 
source fields (or more precisely, the source fields you care about preserving) 
should be either stored or docValues true. 
ii) The datatype of an existing field in schema.xml shouldn't change upon Solr 
upgrade. Introducing new fields is fine. 

For indexes where this limitation is not a problem (it wasn't for us!), the 
tool can reindex in-place on the same core with zero downtime and legitimately 
"upgrade" the index. This can remove a lot of operational headaches, especially 
in environments with hundreds/thousands of very large indexes.


> Automatically upgrade Solr indexes without needing to reindex from source
> -
>
> Key: SOLR-17725
> URL: https://issues.apache.org/jira/browse/SOLR-17725
> Project: Solr
>  Issue Type: Improvement
>Reporter: Rahul Goswami
>Priority: Major
>
> Today upgrading from Solr version X to X+2 requires complete reingestion of 
> data from source. This comes from Lucene's constraint which only guarantees 
> index compatibility between the version the index was created in and the 
> immediate next version. 
> This reindexing usually comes with added downtime and/or cost. Especially in 
> case of deployments which are in customer environments and not completely in 
> control of the vendor, this proposition of having to completely reindex the 
> data can become a hard sell.
> I, on behalf of my employer, Commvault, have developed a way which achieves 
> this reindexing in-place on the same index. Also, the process automatically 
> keeps "upgrading" the indexes over multiple subsequent Solr upgrades without 
> needing manual intervention. 
> It comes with the following limitations:
> i) All _source_ fields need to be either stored=true or docValues=true. Any 
> copyField destination fields can be stored=false of course, just that the 
> source fields (or more precisely, the source fields you care about 
> preserving) should be either stored or docValues true. 
> ii) The datatype of an existing field in schema.xml shouldn't change upon 
> Solr upgrade. Introducing new fields is fine. 
> For indexes where this limitation is not a problem (it wasn't for us!), the 
> tool can reindex in-place on th

[jira] [Commented] (SOLR-17725) Automatically upgrade Solr indexes without needing to reindex from source

2025-04-01 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-17725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17940243#comment-17940243
 ] 

Rahul Goswami commented on SOLR-17725:
--

 

Attached document outlines an example where the upgrade tool works on an index 
originally created in Solr 7.x, AFTER an upgrade to Solr 8.x. 

 

Key points:

1) Lucene version X can read index created in version X-1. Writing of new 
segments happens with the latest version codec.

2) When a segment merge happens, the segment maintains a version stamp 
"minVersion" which is the least version of the segment participating in a merge.

3) The segments_* file in a Lucene index maintains the Lucene version where the 
index was first created.

 

The design doc outlines the process of converting all segments to the new 
version. It's sort of a pull model where you first upgrade and then "pull" the 
index to the current version.

By the end of the process outlined in the doc, all segments get converted to 
the new version and the index in all respects is an "upgraded" index. The only 
missing piece is to update the index creation version in the commit point. I 
did this by exposing a method in Lucene's CommitInfos which validates the 
version of all segments and updates the creation version stamp in the commit 
point (we might need to request an API from Lucene here). When this index is 
opened in Solr 9.x, it can read this index (thanks to point #1) and the same 
process repeats to make the index ready for Solr 10.x. 

> Automatically upgrade Solr indexes without needing to reindex from source
> -
>
> Key: SOLR-17725
> URL: https://issues.apache.org/jira/browse/SOLR-17725
> Project: Solr
>  Issue Type: Improvement
>Reporter: Rahul Goswami
>Priority: Major
> Attachments: High Level Design.png
>
>
> Today upgrading from Solr version X to X+2 requires complete reingestion of 
> data from source. This comes from Lucene's constraint which only guarantees 
> index compatibility between the version the index was created in and the 
> immediate next version. 
> This reindexing usually comes with added downtime and/or cost. Especially in 
> case of deployments which are in customer environments and not completely in 
> control of the vendor, this proposition of having to completely reindex the 
> data can become a hard sell.
> I, on behalf of my employer, Commvault, have developed a way which achieves 
> this reindexing in-place on the same index. Also, the process automatically 
> keeps "upgrading" the indexes over multiple subsequent Solr upgrades without 
> needing manual intervention. 
> It comes with the following limitations:
> i) All _source_ fields need to be either stored=true or docValues=true. Any 
> copyField destination fields can be stored=false of course, just that the 
> source fields (or more precisely, the source fields you care about 
> preserving) should be either stored or docValues true. 
> ii) The datatype of an existing field in schema.xml shouldn't change upon 
> Solr upgrade. Introducing new fields is fine. 
> For indexes where this limitation is not a problem (it wasn't for us!), the 
> tool can reindex in-place on the same core with zero downtime and 
> legitimately "upgrade" the index. This can remove a lot of operational 
> headaches, especially in environments with hundreds/thousands of very large 
> indexes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Comment Edited] (SOLR-16703) Clearing all documents of an index should delete traces of a previous Lucene version

2025-04-07 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-16703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17941685#comment-17941685
 ] 

Rahul Goswami edited comment on SOLR-16703 at 4/7/25 6:43 PM:
--

[~gjourdan] The effort is underway as part of 
https://issues.apache.org/jira/browse/SOLR-17725. 

The solution for the specific requirement in this Jira requires a change from 
Lucene folks to update the version in CommitInfos. We'll request an API to that 
effect as part of the above mentioned JIRA.


was (Author: rahul196...@gmail.com):
[~gjourdan] The effort is underway on 
https://issues.apache.org/jira/browse/SOLR-17725

> Clearing all documents of an index should delete traces of a previous Lucene 
> version
> 
>
> Key: SOLR-16703
> URL: https://issues.apache.org/jira/browse/SOLR-16703
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 7.6, 8.11.2, 9.1.1
>Reporter: Gaël Jourdan
>Priority: Major
>
> _This is a ticket following a discussion on Slack with_ [~elyograg] _and_ 
> [~wunder] _especially._
> h1. High level scenario
> Assume you're starting from a current Solr server in version 7.x and want to 
> upgrade to 8.x then 9.x.
> Upgrading from 7.x to 8.x works fine. Indexes of 7.x can still be read with 
> Solr 8.x.
> On a regular basis, you clear* the index to start fresh, assuming this will 
> recreate index in version 8.x.
> This run nicely for some time. Then you want to upgrade to 9.x. When 
> starting, you get an error saying that the index is still 7.x and cannot be 
> read by 9.x.
>  
> *This is surprising because you'd expect that starting from a fresh index in 
> 8.x would have removed any trace of 7.x.*
>  
> _* : when I say "clear", I mean "delete by query \{{* : * }}all docs" and 
> then commit + optionally optimize._
> h1. What I'd like to see
> Clearing an index when running Solr version N should delete any trace of 
> Lucene version N-1.
> Otherwise this forces users to delete an index (core / collection) and 
> recreate it rather than just clearing it.
> h1. Detailed scenario to reproduce
> The following steps reproduces the issue with a standalone Solr instance 
> running in Docker but I experienced the issue in SolrCloud mode running on 
> VMs and/or bare-metal.
>  
> Also note that for personal troubleshooting I used the tool "luceneupgrader" 
> available at [https://github.com/hakanai/luceneupgrader] but it's not 
> necessary to reproduce the issue.
>  
> 1. Create a directory for data
> {code:java}
> $ mkdir solrdata
> $ chmod -R a+rwx solrdata {code}
>  
> 2. Start a Solr 7.x server, create a core and push some docs
> {code:java}
> $ docker run -d -v "$PWD/solrdata:/opt/solr/server/solr/mycores:rw" -p 
> 8983:8983 --name my_solr_7 solr:7.6.0 solr-precreate gettingstarted
> $ docker exec -it my_solr_7 post -c gettingstarted 
> example/exampledocs/manufacturers.xml
> $ curl -s 'http://localhost:8983/solr/gettingstarted/select?q=*:*' | jq 
> .response.numFound
> 11{code}
>  
> 3. Look at the index files and check version
> {code:java}
> $ ll solrdata/gettingstarted/data/index                                       
>   
> total 40K
> -rw-r--r--. 1 8983 8983  718 16 mars  17:37 _0.fdt
> -rw-r--r--. 1 8983 8983   84 16 mars  17:37 _0.fdx
> -rw-r--r--. 1 8983 8983  656 16 mars  17:37 _0.fnm
> -rw-r--r--. 1 8983 8983  112 16 mars  17:37 _0_Lucene50_0.doc
> -rw-r--r--. 1 8983 8983 1,1K 16 mars  17:37 _0_Lucene50_0.tim
> -rw-r--r--. 1 8983 8983  145 16 mars  17:37 _0_Lucene50_0.tip
> -rw-r--r--. 1 8983 8983  767 16 mars  17:37 _0_Lucene70_0.dvd
> -rw-r--r--. 1 8983 8983  730 16 mars  17:37 _0_Lucene70_0.dvm
> -rw-r--r--. 1 8983 8983  478 16 mars  17:37 _0.si
> -rw-r--r--. 1 8983 8983  203 16 mars  17:37 segments_2
> -rw-r--r--. 1 8983 8983    0 16 mars  17:36 write.lock
> $ java -jar luceneupgrader-0.6.0.jar info solrdata/gettingstarted/data/index 
> Lucene index version: 7
> {code}
>  
> 4. Stop Solr 7, update solrconfig.xml for Solr 8 and start a Solr 8 server
> {code:java}
> $ docker stop my_solr_7
> $ vim solrdata/gettingstarted/conf/solrconfig.xml
> $ cat solrdata/gettingstarted/conf/solrconfig.xml | grep luceneMatchVersion   
>   8.11.2 
> $ docker run -d -v "$PWD/solrdata:/var/solr/data:rw" -p 8983:8983 --name 
> my_solr_8 solr:8.11.2{code}
>  
> 5. Check index is loaded ok and docs are still there
> {code:java}
> $ curl -s 'http://localhost:8983/solr/gettingstarted/select?q=*:*' | jq 
> .response.numFound
> 11 {code}
>  
> 6. Clear the index and check index files / version
> {code:java}
> $ curl -X POST -H 'Content-Type: application/json' 
> 'http://localhost:8983/solr/gettingstarted/update?commit=true' -d '{ 
> "delete": {"query":"*:*"} }'
> $ ll solrdata/gettingstarted/data/index

[jira] [Comment Edited] (SOLR-17725) Automatically upgrade Solr indexes without needing to reindex from source

2025-04-07 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-17725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17941691#comment-17941691
 ] 

Rahul Goswami edited comment on SOLR-17725 at 4/7/25 7:18 PM:
--

[~ab] For those running SolrCloud AND having enough capacity in terms of 
infrastructure and budget, the REINDEXCOLLECTION command is a good option. I 
see that it reindexes onto a parallel collection. So for clusters with 
hundreds/thousands of large indexes, that cost can be substantial. Also the 
source collection is put in read-only mode while the reindexing happens. So can 
be a point of contention in case of environments which are more update heavy 
than search heavy (for eg: for us at Commvault). 

By means of this Jira I am attempting to overcome the Lucene limitation which 
forces you to reindex from source, when you really don't HAVE to. At least I 
would like to offer that option to users who are more cost sensitive or 
operationally sensitive (eg: Solutions which package Solr as part of the 
application and are installed/deployed on customer sites. It can be awkward to 
reason with customers as to why a solution upgrade may need a 
downtime/additional infra capacity if it involves a Solr upgrade).

The proposed solution reindexes into the same core, can be easily adapted to 
work with both standalone Solr and SolrCloud, and allows both updates and 
searches to be served while doing so. This also helps remove additional 
operational overhead since now users can focus on just the Solr upgrade without 
having to worry about index compatibility.   

 


was (Author: rahul196...@gmail.com):
[~ab] For those running SolrCloud AND having enough capacity in terms of 
infrastructure and budget, the REINDEXCOLLECTION command is a good option. I 
see that it reindexes onto a parallel collection. So for clusters with 
hundreds/thousands of large indexes, that cost can be substantial. Also the 
source collection is put in read-only mode while the reindexing happens. So can 
be a point of contention in case of environments which are more update heavy 
than search heavy (for eg: for us at Commvault). 

By means of this Jira I am attempting to overcome the Lucene limitation which 
forces you to reindex from source, when you really don't HAVE to. At least I 
would like to offer that option to users who are more cost sensitive or 
operationally sensitive (eg: Solutions which package Solr as part of the 
application and are installed/deployed on customer sites. It can be awkward to 
reason with customers as to why a solution upgrade may need a downtime if it 
involves a Solr upgrade).

The proposed solution reindexes into the same core, can be easily adapted to 
work with both standalone Solr and SolrCloud, and allows both updates and 
searches to be served while doing so. This also helps remove additional 
operational overhead since now users can focus on just the Solr upgrade without 
having to worry about index compatibility.   

 

> Automatically upgrade Solr indexes without needing to reindex from source
> -
>
> Key: SOLR-17725
> URL: https://issues.apache.org/jira/browse/SOLR-17725
> Project: Solr
>  Issue Type: Improvement
>Reporter: Rahul Goswami
>Priority: Major
> Attachments: High Level Design.png
>
>
> Today upgrading from Solr version X to X+2 requires complete reingestion of 
> data from source. This comes from Lucene's constraint which only guarantees 
> index compatibility between the version the index was created in and the 
> immediate next version. 
> This reindexing usually comes with added downtime and/or cost. Especially in 
> case of deployments which are in customer environments and not completely in 
> control of the vendor, this proposition of having to completely reindex the 
> data can become a hard sell.
> I, on behalf of my employer, Commvault, have developed a way which achieves 
> this reindexing in-place on the same index. Also, the process automatically 
> keeps "upgrading" the indexes over multiple subsequent Solr upgrades without 
> needing manual intervention. 
> It comes with the following limitations:
> i) All _source_ fields need to be either stored=true or docValues=true. Any 
> copyField destination fields can be stored=false of course, just that the 
> source fields (or more precisely, the source fields you care about 
> preserving) should be either stored or docValues true. 
> ii) The datatype of an existing field in schema.xml shouldn't change upon 
> Solr upgrade. Introducing new fields is fine. 
> For indexes where this limitation is not a problem (it wasn't for us!), the 
> tool can reindex in-place on the same core with zero downtime and 
> legitimately "upgrade" the index. This can remove a lot of operational 
> hea

[jira] [Commented] (SOLR-17725) Automatically upgrade Solr indexes without needing to reindex from source

2025-04-07 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-17725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17941704#comment-17941704
 ] 

Rahul Goswami commented on SOLR-17725:
--

[~janhoy]  Thanks for taking the time to review the JIRA. Please find my 
thoughts on your questions below:

 
1) Do you intend for this to be a new Solr API, if so what is the proposed API? 
or a CLI utility tool to run on a cold index folder?
> The implementation needs to run on a hot index for it to be lossless. 
> Indexing calls happen using Solr APIs so Solr will need to be running. In our 
> custom implementation I have hooked the process into SolrDispatchFilter 
> load() so that the process can start upon server start for least operational 
> overhead. As a generic solution I am thinking we can expose it as an action 
> (/solr/admin/cores?action=UPGRADEINDEXES) with an "async" option for 
> trackability. This way users can hook up the command into their shell/cmd 
> scripts after Solr starts. Open to suggestions here,  
 
2) Is one of your design goals to avoid the need for 2-3x disk space during the 
reindex, since you work on segment level and do merges?
> Reducing infrastructure costs is a major design goal here. Also removing the 
> operational overhead of index uprgade during Solr uprgade when possible. 
 
3) Requring Lucene API change is a potential blocker, I'd not be surprised if 
the Lucene project rejects making the "created-version" property writable, so 
such a discussion with them would come early
> I agree. I am hopeful(!!) this will not be rejected though since they can 
> implement guardrails around changing the "created-version" property for added 
> security. In my implementation I added the change in CommitInfos to check for 
> all the segments in a commit and ensure they are the new version in every 
> aspect before setting the created-version property. This already happens in a 
> synchronized block so in my (limited) opinion, it should be safe. The API 
> they give us can do all required internal validations and fail gracefully 
> without any harm to the index. I can get a discussion started with the Lucene 
> folks once we agree on the basics of this implemetation. Or do you suggest I 
> do that right away?
 
4) Obviously a new Solr API needs to play well with SolrCloud as well as other 
features such such as shard split / move etc. Have you thought about locking / 
conflicts?
> SolrCloud challenges are not factored into the current implementation. But 
> given the process works at Core level and agnostic of the mode, I am 
> optimistic we can adapt the solution for SolrCloud through PR discussions.
We might have to block certain operations like splitshard while this process is 
underway on a collection. 
 
5) A reindex-collection API is probably wanted, however it could be acceptable 
to implement a "core-level" API first and later add a "collection-level" API on 
top of it
> Agreed
 
6) Challenge the assumption that "in-place" segment level is the best choice 
for this feature. Re-indexing into a new collection due to major schema changes 
is also a common use case that this will not address
> I would revert to my answer to your second question in defense of the 
> "in-place" implementation. Segment level processing gives us the ability to 
> restrict pollution of index due to merges as we reindex and also 
> restartability. 
Agreed this is not a substitute for when a field data type changes. This is 
intended to be a substitute for index upgrade when you upgrade Solr so as to 
overcome the X --> X+1 --> X+2 version upgrade path limitation which exists 
today despite no schema changes. Of course, users are free to add new fields 
and should still be able to use this utility.

> Automatically upgrade Solr indexes without needing to reindex from source
> -
>
> Key: SOLR-17725
> URL: https://issues.apache.org/jira/browse/SOLR-17725
> Project: Solr
>  Issue Type: Improvement
>Reporter: Rahul Goswami
>Priority: Major
> Attachments: High Level Design.png
>
>
> Today upgrading from Solr version X to X+2 requires complete reingestion of 
> data from source. This comes from Lucene's constraint which only guarantees 
> index compatibility between the version the index was created in and the 
> immediate next version. 
> This reindexing usually comes with added downtime and/or cost. Especially in 
> case of deployments which are in customer environments and not completely in 
> control of the vendor, this proposition of having to completely reindex the 
> data can become a hard sell.
> I, on behalf of my employer, Commvault, have developed a way which achieves 
> this reindexing in-place on the same index. Also, the process automatically 
> keeps "upgrading" the indexes over multipl

[jira] [Comment Edited] (SOLR-17725) Automatically upgrade Solr indexes without needing to reindex from source

2025-04-07 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-17725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17941704#comment-17941704
 ] 

Rahul Goswami edited comment on SOLR-17725 at 4/7/25 8:11 PM:
--

[~janhoy]  Thanks for taking the time to review the JIRA. Please find my 
thoughts on your questions below:

 
1) Do you intend for this to be a new Solr API, if so what is the proposed API? 
or a CLI utility tool to run on a cold index folder?
> The implementation needs to run on a hot index for it to be lossless. 
> Indexing calls happen using Solr APIs so Solr will need to be running. In our 
> custom implementation I have hooked the process into SolrDispatchFilter 
> load() so that the process can start upon server start for least operational 
> overhead. As a generic solution I am thinking we can expose it as an action 
> (/solr/admin/cores?action=UPGRADEINDEXES) with an "async" option for 
> trackability. This way users can hook up the command into their shell/cmd 
> scripts after Solr starts. Open to suggestions here,  
 
2) Is one of your design goals to avoid the need for 2-3x disk space during the 
reindex, since you work on segment level and do merges?
> Reducing infrastructure costs is a major design goal here. Also removing the 
> operational overhead of index uprgade during Solr uprgade when possible. 
 
3) Requring Lucene API change is a potential blocker, I'd not be surprised if 
the Lucene project rejects making the "created-version" property writable, so 
such a discussion with them would come early
> I agree. I am hopeful(!!) this will not be rejected though since they can 
> implement guardrails around changing the "created-version" property for added 
> security. In my implementation I added the change in CommitInfos to check for 
> all the segments in a commit and ensure they are the new version in every 
> aspect before setting the created-version property. This already happens in a 
> synchronized block upon commit, so in my (limited) opinion, it should be 
> safe. The API they give us can do all required internal validations and fail 
> gracefully without any harm to the index. I can get a discussion started with 
> the Lucene folks once we agree on the basics of this implementation. Or do 
> you suggest I do that right away?
 
4) Obviously a new Solr API needs to play well with SolrCloud as well as other 
features such such as shard split / move etc. Have you thought about locking / 
conflicts?
> SolrCloud challenges are not factored into the current implementation. But 
> given the process works at Core level and agnostic of the mode, I am 
> optimistic we can adapt the solution for SolrCloud through PR discussions.
We might have to block certain operations like splitshard while this process is 
underway on a collection. 
 
5) A reindex-collection API is probably wanted, however it could be acceptable 
to implement a "core-level" API first and later add a "collection-level" API on 
top of it
> Agreed
 
6) Challenge the assumption that "in-place" segment level is the best choice 
for this feature. Re-indexing into a new collection due to major schema changes 
is also a common use case that this will not address
> I would revert to my answer to your second question in defense of the 
> "in-place" implementation. Segment level processing gives us the ability to 
> restrict pollution of index due to merges as we reindex and also 
> restartability. 
Agreed this is not a substitute for when a field data type changes. This is 
intended to be a substitute for index upgrade when you upgrade Solr so as to 
overcome the X --> X+1 --> X+2 version upgrade path limitation which exists 
today despite no schema changes. Of course, users are free to add new fields 
and should still be able to use this utility.


was (Author: rahul196...@gmail.com):
[~janhoy]  Thanks for taking the time to review the JIRA. Please find my 
thoughts on your questions below:

 
1) Do you intend for this to be a new Solr API, if so what is the proposed API? 
or a CLI utility tool to run on a cold index folder?
> The implementation needs to run on a hot index for it to be lossless. 
> Indexing calls happen using Solr APIs so Solr will need to be running. In our 
> custom implementation I have hooked the process into SolrDispatchFilter 
> load() so that the process can start upon server start for least operational 
> overhead. As a generic solution I am thinking we can expose it as an action 
> (/solr/admin/cores?action=UPGRADEINDEXES) with an "async" option for 
> trackability. This way users can hook up the command into their shell/cmd 
> scripts after Solr starts. Open to suggestions here,  
 
2) Is one of your design goals to avoid the need for 2-3x disk space during the 
reindex, since you work on segment level and do merges?
> Reducing infrastructure costs is a major design goal here. Also removing the 
> operational

[jira] [Commented] (SOLR-16703) Clearing all documents of an index should delete traces of a previous Lucene version

2025-04-17 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-16703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17945586#comment-17945586
 ] 

Rahul Goswami commented on SOLR-16703:
--

[~gjourdan] Just curious. Since you are ok with reindexing from source, what 
prevents you from physically deleting the "index" directory for each 
core/replica instead? That way reindexing will again populate the index without 
any trace of previous Solr/Lucene version and without you having to recreate 
the collection.

The fix for your exact issue requires an API from Lucene which I am going to 
request anyway, but I expect them to ask the same question. 

> Clearing all documents of an index should delete traces of a previous Lucene 
> version
> 
>
> Key: SOLR-16703
> URL: https://issues.apache.org/jira/browse/SOLR-16703
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 7.6, 8.11.2, 9.1.1
>Reporter: Gaël Jourdan
>Priority: Major
>
> _This is a ticket following a discussion on Slack with_ [~elyograg] _and_ 
> [~wunder] _especially._
> h1. High level scenario
> Assume you're starting from a current Solr server in version 7.x and want to 
> upgrade to 8.x then 9.x.
> Upgrading from 7.x to 8.x works fine. Indexes of 7.x can still be read with 
> Solr 8.x.
> On a regular basis, you clear* the index to start fresh, assuming this will 
> recreate index in version 8.x.
> This run nicely for some time. Then you want to upgrade to 9.x. When 
> starting, you get an error saying that the index is still 7.x and cannot be 
> read by 9.x.
>  
> *This is surprising because you'd expect that starting from a fresh index in 
> 8.x would have removed any trace of 7.x.*
>  
> _* : when I say "clear", I mean "delete by query \{{* : * }}all docs" and 
> then commit + optionally optimize._
> h1. What I'd like to see
> Clearing an index when running Solr version N should delete any trace of 
> Lucene version N-1.
> Otherwise this forces users to delete an index (core / collection) and 
> recreate it rather than just clearing it.
> h1. Detailed scenario to reproduce
> The following steps reproduces the issue with a standalone Solr instance 
> running in Docker but I experienced the issue in SolrCloud mode running on 
> VMs and/or bare-metal.
>  
> Also note that for personal troubleshooting I used the tool "luceneupgrader" 
> available at [https://github.com/hakanai/luceneupgrader] but it's not 
> necessary to reproduce the issue.
>  
> 1. Create a directory for data
> {code:java}
> $ mkdir solrdata
> $ chmod -R a+rwx solrdata {code}
>  
> 2. Start a Solr 7.x server, create a core and push some docs
> {code:java}
> $ docker run -d -v "$PWD/solrdata:/opt/solr/server/solr/mycores:rw" -p 
> 8983:8983 --name my_solr_7 solr:7.6.0 solr-precreate gettingstarted
> $ docker exec -it my_solr_7 post -c gettingstarted 
> example/exampledocs/manufacturers.xml
> $ curl -s 'http://localhost:8983/solr/gettingstarted/select?q=*:*' | jq 
> .response.numFound
> 11{code}
>  
> 3. Look at the index files and check version
> {code:java}
> $ ll solrdata/gettingstarted/data/index                                       
>   
> total 40K
> -rw-r--r--. 1 8983 8983  718 16 mars  17:37 _0.fdt
> -rw-r--r--. 1 8983 8983   84 16 mars  17:37 _0.fdx
> -rw-r--r--. 1 8983 8983  656 16 mars  17:37 _0.fnm
> -rw-r--r--. 1 8983 8983  112 16 mars  17:37 _0_Lucene50_0.doc
> -rw-r--r--. 1 8983 8983 1,1K 16 mars  17:37 _0_Lucene50_0.tim
> -rw-r--r--. 1 8983 8983  145 16 mars  17:37 _0_Lucene50_0.tip
> -rw-r--r--. 1 8983 8983  767 16 mars  17:37 _0_Lucene70_0.dvd
> -rw-r--r--. 1 8983 8983  730 16 mars  17:37 _0_Lucene70_0.dvm
> -rw-r--r--. 1 8983 8983  478 16 mars  17:37 _0.si
> -rw-r--r--. 1 8983 8983  203 16 mars  17:37 segments_2
> -rw-r--r--. 1 8983 8983    0 16 mars  17:36 write.lock
> $ java -jar luceneupgrader-0.6.0.jar info solrdata/gettingstarted/data/index 
> Lucene index version: 7
> {code}
>  
> 4. Stop Solr 7, update solrconfig.xml for Solr 8 and start a Solr 8 server
> {code:java}
> $ docker stop my_solr_7
> $ vim solrdata/gettingstarted/conf/solrconfig.xml
> $ cat solrdata/gettingstarted/conf/solrconfig.xml | grep luceneMatchVersion   
>   8.11.2 
> $ docker run -d -v "$PWD/solrdata:/var/solr/data:rw" -p 8983:8983 --name 
> my_solr_8 solr:8.11.2{code}
>  
> 5. Check index is loaded ok and docs are still there
> {code:java}
> $ curl -s 'http://localhost:8983/solr/gettingstarted/select?q=*:*' | jq 
> .response.numFound
> 11 {code}
>  
> 6. Clear the index and check index files / version
> {code:java}
> $ curl -X POST -H 'Content-Type: application/json' 
> 'http://localhost:8983/solr/gettingstarted/update?commit=true' -d '{ 
> "delete": {"query":"*:*"} }'
> $ ll solrdata/gettingstarted/data/index

[jira] [Comment Edited] (SOLR-17725) Automatically upgrade Solr indexes without needing to reindex from source

2025-04-18 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-17725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17940243#comment-17940243
 ] 

Rahul Goswami edited comment on SOLR-17725 at 4/19/25 12:00 AM:


 

Attached document outlines an example where the upgrade tool works on an index 
originally created in Solr 7.x, AFTER an upgrade to Solr 8.x. 

 

Key points:

1) Lucene version X can read index created in version X-1. Writing of new 
segments happens with the latest version codec.

2) When a segment merge happens, the segment maintains a version stamp 
"minVersion" which is the least version of the segment participating in a merge.

3) The segments_* file in a Lucene index maintains the Lucene version where the 
index was first created.

 

The design doc outlines the process of converting all segments to the new 
version. It's sort of a pull model where you first upgrade and then "pull" the 
index to the current version.

By the end of the process outlined in the doc, all segments get converted to 
the new version and the index in all respects is an "upgraded" index. The only 
missing piece is to update the index creation version in the commit point. I 
did this by exposing a method in Lucene's IndexWriter which validates the 
version of all segments and updates the creation version stamp in the commit 
point (we might need to request an API from Lucene here). When this index is 
opened in Solr 9.x, it can read this index (thanks to point #1) and the same 
process repeats to make the index ready for Solr 10.x. 


was (Author: rahul196...@gmail.com):
 

Attached document outlines an example where the upgrade tool works on an index 
originally created in Solr 7.x, AFTER an upgrade to Solr 8.x. 

 

Key points:

1) Lucene version X can read index created in version X-1. Writing of new 
segments happens with the latest version codec.

2) When a segment merge happens, the segment maintains a version stamp 
"minVersion" which is the least version of the segment participating in a merge.

3) The segments_* file in a Lucene index maintains the Lucene version where the 
index was first created.

 

The design doc outlines the process of converting all segments to the new 
version. It's sort of a pull model where you first upgrade and then "pull" the 
index to the current version.

By the end of the process outlined in the doc, all segments get converted to 
the new version and the index in all respects is an "upgraded" index. The only 
missing piece is to update the index creation version in the commit point. I 
did this by exposing a method in Lucene's CommitInfos which validates the 
version of all segments and updates the creation version stamp in the commit 
point (we might need to request an API from Lucene here). When this index is 
opened in Solr 9.x, it can read this index (thanks to point #1) and the same 
process repeats to make the index ready for Solr 10.x. 

> Automatically upgrade Solr indexes without needing to reindex from source
> -
>
> Key: SOLR-17725
> URL: https://issues.apache.org/jira/browse/SOLR-17725
> Project: Solr
>  Issue Type: Improvement
>Reporter: Rahul Goswami
>Priority: Major
> Attachments: High Level Design.png
>
>
> Today upgrading from Solr version X to X+2 requires complete reingestion of 
> data from source. This comes from Lucene's constraint which only guarantees 
> index compatibility between the version the index was created in and the 
> immediate next version. 
> This reindexing usually comes with added downtime and/or cost. Especially in 
> case of deployments which are in customer environments and not completely in 
> control of the vendor, this proposition of having to completely reindex the 
> data can become a hard sell.
> I, on behalf of my employer, Commvault, have developed a way which achieves 
> this reindexing in-place on the same index. Also, the process automatically 
> keeps "upgrading" the indexes over multiple subsequent Solr upgrades without 
> needing manual intervention. 
> It comes with the following limitations:
> i) All _source_ fields need to be either stored=true or docValues=true. Any 
> copyField destination fields can be stored=false of course, just that the 
> source fields (or more precisely, the source fields you care about 
> preserving) should be either stored or docValues true. 
> ii) The datatype of an existing field in schema.xml shouldn't change upon 
> Solr upgrade. Introducing new fields is fine. 
> For indexes where this limitation is not a problem (it wasn't for us!), the 
> tool can reindex in-place on the same core with zero downtime and 
> legitimately "upgrade" the index. This can remove a lot of operational 
> headaches, especially in environments with hundreds/thousands

[jira] [Comment Edited] (SOLR-17725) Automatically upgrade Solr indexes without needing to reindex from source

2025-04-18 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-17725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17941704#comment-17941704
 ] 

Rahul Goswami edited comment on SOLR-17725 at 4/19/25 12:03 AM:


[~janhoy]  Thanks for taking the time to review the JIRA. Please find my 
thoughts on your questions below:

 
1) Do you intend for this to be a new Solr API, if so what is the proposed API? 
or a CLI utility tool to run on a cold index folder?
> The implementation needs to run on a hot index for it to be lossless. 
> Indexing calls happen using Solr APIs so Solr will need to be running. In our 
> custom implementation I have hooked the process into SolrDispatchFilter 
> load() so that the process can start upon server start for least operational 
> overhead. As a generic solution I am thinking we can expose it as an action 
> (/solr/admin/cores?action=UPGRADEINDEXES) with an "async" option for 
> trackability. This way users can hook up the command into their shell/cmd 
> scripts after Solr starts. Open to suggestions here,  
 
2) Is one of your design goals to avoid the need for 2-3x disk space during the 
reindex, since you work on segment level and do merges?
> Reducing infrastructure costs is a major design goal here. Also removing the 
> operational overhead of index uprgade during Solr uprgade when possible. 
 
3) Requring Lucene API change is a potential blocker, I'd not be surprised if 
the Lucene project rejects making the "created-version" property writable, so 
such a discussion with them would come early
> I agree. I am hopeful(!!) this will not be rejected though since they can 
> implement guardrails around changing the "created-version" property for added 
> security. In my implementation I added the change in Lucene IndexWriter to 
> check for all the segments in a commit and ensure they are the new version in 
> every aspect before setting the created-version property. This already 
> happens in a synchronized block upon commit, so in my (limited) opinion, it 
> should be safe. The API they give us can do all required internal validations 
> and fail gracefully without any harm to the index. I can get a discussion 
> started with the Lucene folks once we agree on the basics of this 
> implementation. Or do you suggest I do that right away?
 
4) Obviously a new Solr API needs to play well with SolrCloud as well as other 
features such such as shard split / move etc. Have you thought about locking / 
conflicts?
> SolrCloud challenges are not factored into the current implementation. But 
> given the process works at Core level and agnostic of the mode, I am 
> optimistic we can adapt the solution for SolrCloud through PR discussions.
We might have to block certain operations like splitshard while this process is 
underway on a collection. 
 
5) A reindex-collection API is probably wanted, however it could be acceptable 
to implement a "core-level" API first and later add a "collection-level" API on 
top of it
> Agreed
 
6) Challenge the assumption that "in-place" segment level is the best choice 
for this feature. Re-indexing into a new collection due to major schema changes 
is also a common use case that this will not address
> I would revert to my answer to your second question in defense of the 
> "in-place" implementation. Segment level processing gives us the ability to 
> restrict pollution of index due to merges as we reindex and also 
> restartability. 
Agreed this is not a substitute for when a field data type changes. This is 
intended to be a substitute for index upgrade when you upgrade Solr so as to 
overcome the X --> X+1 --> X+2 version upgrade path limitation which exists 
today despite no schema changes. Of course, users are free to add new fields 
and should still be able to use this utility.


was (Author: rahul196...@gmail.com):
[~janhoy]  Thanks for taking the time to review the JIRA. Please find my 
thoughts on your questions below:

 
1) Do you intend for this to be a new Solr API, if so what is the proposed API? 
or a CLI utility tool to run on a cold index folder?
> The implementation needs to run on a hot index for it to be lossless. 
> Indexing calls happen using Solr APIs so Solr will need to be running. In our 
> custom implementation I have hooked the process into SolrDispatchFilter 
> load() so that the process can start upon server start for least operational 
> overhead. As a generic solution I am thinking we can expose it as an action 
> (/solr/admin/cores?action=UPGRADEINDEXES) with an "async" option for 
> trackability. This way users can hook up the command into their shell/cmd 
> scripts after Solr starts. Open to suggestions here,  
 
2) Is one of your design goals to avoid the need for 2-3x disk space during the 
reindex, since you work on segment level and do merges?
> Reducing infrastructure costs is a major design goal here. Also removing the 
> o

[jira] [Comment Edited] (SOLR-17725) Automatically upgrade Solr indexes without needing to reindex from source

2025-04-14 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-17725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17944396#comment-17944396
 ] 

Rahul Goswami edited comment on SOLR-17725 at 4/14/25 3:44 PM:
---

Will do [~dsmiley] Thanks.

 

[~gus] As far as I can see, the current implementation doesn't run the risk of 
corruption. The status is maintained in two ways:

1) At the core level -> to keep track of which core was being processed when 
the service went down/killed. A file autoupgrade_status.csv is maintained which 
is written each time a core is picked up for processing and a status is set for 
the same. Each time the process resumes it picks up the core with status 
"REINDEXING_ACTIVE" if any. For SolrCloud, this file can be housed in Zookeeper 
. This is an implementation detail I am happy to discuss further, but in our 
(Commvault's)  implementation we recognize the following statuses

            DEFAULT,
            REINDEXING_ACTIVE,
            REINDEXING_PAUSED,
            PROCESSED,
            ERROR,
            CORRECTVERSION

 

2) At the segment level -> This is where we piggyback on Lucene's design and 
it's beautiful! As we iterate over each segment, we read the live docs out of  
the segment, create a SolrInputDocument out of it and reindex using Solr's API. 
This helps achieve two things:

i) A reindexed doc helps mark an existing (old) doc as deleted (when 
auto-commit kicks in). This way if the service goes down, we don't need to 
process the already processed docs of the segment. And if the service goes down 
before a commit could be processed, the small penalty is reprocessing the docs 
of only that segment. 

ii) When a segment is fully processed, Lucene's DeletionPolicy deletes it 
reclaiming space in the process. Hence we never process the same segment again.

Note that as we do this, we are in no way interfering with Lucene's index 
structure directly and only interacting by means of APIs.

 

A combination of these factors helps maintain continuity in the processing of a 
core despite failures, without running the risk of corruption.

 

 


was (Author: rahul196...@gmail.com):
Will do [~dsmiley] Thanks.

 

[~gus] As far as I can see, the current implementation doesn't run the risk of 
corruption. The status is maintained in two ways:

1) At the core level -> to keep track of which core was being processed when 
the service went down/killed. A file autoupgrade_status.csv is maintained which 
is written each time a core is picked up for processing and a status is set for 
the same. Each time the process resumes it picks up the core with status 
"REINDEXING_ACTIVE" if any. For SolrCloud, this file can be housed in Zookeeper 
. This is an implementation detail I am happy to discuss further, but in our 
(Commvault's)  implementation we recognize the following statuses

            DEFAULT,
            REINDEXING_ACTIVE,
            REINDEXING_PAUSED,
            PROCESSED,
            ERROR,
            CORRECTVERSION

 

2) At the segment level -> This is where we piggyback on Lucene's design and 
it's beautiful! As we iterate over each segment, we read the live docs out of  
the segment, create a SolrInputDocument out of it and reindex using Solr's API. 
This helps achieve two things:

i) A reindexed doc helps mark an existing (old) doc as deleted (when 
auto-commit kicks in). This way if the service goes down, we don't need to 
process the already processed docs of the service. And if the service goes down 
before a commit could be processed, the small penalty is reprocessing the docs 
of only that segment. 

ii) When a segment is fully processed, Lucene's DeletionPolicy deletes it 
reclaiming space in the process. Hence we never process the same segment again.

Note that as we do this, we are in no way interfering with Lucene's index 
structure directly and only interacting by means of APIs.

 

A combination of these factors helps maintain continuity in the processing of a 
core despite failures, without running the risk of corruption.

 

 

> Automatically upgrade Solr indexes without needing to reindex from source
> -
>
> Key: SOLR-17725
> URL: https://issues.apache.org/jira/browse/SOLR-17725
> Project: Solr
>  Issue Type: Improvement
>Reporter: Rahul Goswami
>Priority: Major
> Attachments: High Level Design.png
>
>
> Today upgrading from Solr version X to X+2 requires complete reingestion of 
> data from source. This comes from Lucene's constraint which only guarantees 
> index compatibility between the version the index was created in and the 
> immediate next version. 
> This reindexing usually comes with added downtime and/or cost. Especially in 
> case of deployments which are in customer environments and not compl

[jira] [Comment Edited] (SOLR-17725) Automatically upgrade Solr indexes without needing to reindex from source

2025-04-14 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-17725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17944396#comment-17944396
 ] 

Rahul Goswami edited comment on SOLR-17725 at 4/14/25 3:43 PM:
---

Will do [~dsmiley] Thanks.

 

[~gus] As far as I can see, the current implementation doesn't run the risk of 
corruption. The status is maintained in two ways:

1) At the core level -> to keep track of which core was being processed when 
the service went down/killed. A file autoupgrade_status.csv is maintained which 
is written each time a core is picked up for processing and a status is set for 
the same. Each time the process resumes it picks up the core with status 
"REINDEXING_ACTIVE" if any. For SolrCloud, this file can be housed in Zookeeper 
. This is an implementation detail I am happy to discuss further, but in our 
(Commvault's)  implementation we recognize the following statuses

            DEFAULT,
            REINDEXING_ACTIVE,
            REINDEXING_PAUSED,
            PROCESSED,
            ERROR,
            CORRECTVERSION

 

2) At the segment level -> This is where we piggyback on Lucene's design and 
it's beautiful! As we iterate over each segment, we read the live docs out of  
the segment, create a SolrInputDocument out of it and reindex using Solr's API. 
This helps achieve two things:

i) A reindexed doc helps mark an existing (old) doc as deleted (when 
auto-commit kicks in). This way if the service goes down, we don't need to 
process the already processed docs of the service. And if the service goes down 
before a commit could be processed, the small penalty is reprocessing the docs 
of only that segment. 

ii) When a segment is fully processed, Lucene's DeletionPolicy deletes it 
reclaiming space in the process. Hence we never process the same segment again.

Note that as we do this, we are in no way interfering with Lucene's index 
structure directly and only interacting by means of APIs.

 

A combination of these factors helps maintain continuity in the processing of a 
core despite failures, without running the risk of corruption.

 

 


was (Author: rahul196...@gmail.com):
Will do [~dsmiley] Thanks.

 

[~gus] As far as I can see, the current implementation doesn't run the risk of 
corruption. The status is maintained in two ways:

1) At the core level -> to keep track of which core was being processed when 
the service went down/killed. A file autoupgrade_status.csv is maintained which 
is written each time a core is picked up for processing and a status is set for 
the same. Each time the process resumes it picks up the core with status 
"REINDEXING_ACTIVE" if any. For SolrCloud, this file can be housed in Zookeeper 
. This is an implementation detail I am happy to discuss further, but in our 
(Commvault's)  implementation we recognize the following statuses

            DEFAULT,
            REINDEXING_ACTIVE,
            REINDEXING_PAUSED,
            PROCESSED,
            ERROR,
            CORRECTVERSION

 

2) At the segment level -> This is where we piggyback on Lucene's design and 
it's beautiful! As we iterate over each segment, we are read the live docs out 
of  the segment, create a SolrInputDocument out of it and reindex using Solr's 
API. This helps achieve two things: 

i) A reindexed doc helps mark an existing (old) doc as deleted (when 
auto-commit kicks in). This way if the service goes down, we don't need to 
process the already processed docs of the service. And if the service goes down 
before a commit could be processed, the small penalty is reprocessing the docs 
of only that segment. 

ii) When a segment is fully processed, Lucene's DeletionPolicy deletes it 
reclaiming space in the process. Hence we never process the same segment again.

Note that as we do this, we are in no way interfering with Lucene's index 
structure directly and only interacting by means of APIs.

 

A combination of these factors helps maintain continuity in the processing of a 
core despite failures, without running the risk of corruption.

 

 

> Automatically upgrade Solr indexes without needing to reindex from source
> -
>
> Key: SOLR-17725
> URL: https://issues.apache.org/jira/browse/SOLR-17725
> Project: Solr
>  Issue Type: Improvement
>Reporter: Rahul Goswami
>Priority: Major
> Attachments: High Level Design.png
>
>
> Today upgrading from Solr version X to X+2 requires complete reingestion of 
> data from source. This comes from Lucene's constraint which only guarantees 
> index compatibility between the version the index was created in and the 
> immediate next version. 
> This reindexing usually comes with added downtime and/or cost. Especially in 
> case of deployments which are in customer environments and not

[jira] [Commented] (SOLR-17725) Automatically upgrade Solr indexes without needing to reindex from source

2025-04-14 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-17725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17944396#comment-17944396
 ] 

Rahul Goswami commented on SOLR-17725:
--

Will do [~dsmiley] Thanks.

 

[~gus] As far as I can see, the current implementation doesn't run the risk of 
corruption. The status is maintained in two ways:

1) At the core level -> to keep track of which core was being processed when 
the service went down/killed. A file autoupgrade_status.csv is maintained which 
is written each time a core is picked up for processing and a status is set for 
the same. Each time the process resumes it picks up the core with status 
"REINDEXING_ACTIVE" if any. For SolrCloud, this file can be housed in Zookeeper 
. This is an implementation detail I am happy to discuss further, but in our 
(Commvault's)  implementation we recognize the following statuses

            DEFAULT,
            REINDEXING_ACTIVE,
            REINDEXING_PAUSED,
            PROCESSED,
            ERROR,
            CORRECTVERSION

 

2) At the segment level -> This is where we piggyback on Lucene's design and 
it's beautiful! As we iterate over each segment, we are read the live docs out 
of  the segment, create a SolrInputDocument out of it and reindex using Solr's 
API. This helps achieve two things: 

i) A reindexed doc helps mark an existing (old) doc as deleted (when 
auto-commit kicks in). This way if the service goes down, we don't need to 
process the already processed docs of the service. And if the service goes down 
before a commit could be processed, the small penalty is reprocessing the docs 
of only that segment. 

ii) When a segment is fully processed, Lucene's DeletionPolicy deletes it 
reclaiming space in the process. Hence we never process the same segment again.

Note that as we do this, we are in no way interfering with Lucene's index 
structure directly and only interacting by means of APIs.

 

A combination of these factors helps maintain continuity in the processing of a 
core despite failures, without running the risk of corruption.

 

 

> Automatically upgrade Solr indexes without needing to reindex from source
> -
>
> Key: SOLR-17725
> URL: https://issues.apache.org/jira/browse/SOLR-17725
> Project: Solr
>  Issue Type: Improvement
>Reporter: Rahul Goswami
>Priority: Major
> Attachments: High Level Design.png
>
>
> Today upgrading from Solr version X to X+2 requires complete reingestion of 
> data from source. This comes from Lucene's constraint which only guarantees 
> index compatibility between the version the index was created in and the 
> immediate next version. 
> This reindexing usually comes with added downtime and/or cost. Especially in 
> case of deployments which are in customer environments and not completely in 
> control of the vendor, this proposition of having to completely reindex the 
> data can become a hard sell.
> I, on behalf of my employer, Commvault, have developed a way which achieves 
> this reindexing in-place on the same index. Also, the process automatically 
> keeps "upgrading" the indexes over multiple subsequent Solr upgrades without 
> needing manual intervention. 
> It comes with the following limitations:
> i) All _source_ fields need to be either stored=true or docValues=true. Any 
> copyField destination fields can be stored=false of course, just that the 
> source fields (or more precisely, the source fields you care about 
> preserving) should be either stored or docValues true. 
> ii) The datatype of an existing field in schema.xml shouldn't change upon 
> Solr upgrade. Introducing new fields is fine. 
> For indexes where this limitation is not a problem (it wasn't for us!), the 
> tool can reindex in-place on the same core with zero downtime and 
> legitimately "upgrade" the index. This can remove a lot of operational 
> headaches, especially in environments with hundreds/thousands of very large 
> indexes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-17725) Automatically upgrade Solr indexes without needing to reindex from source

2025-04-11 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-17725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17943738#comment-17943738
 ] 

Rahul Goswami commented on SOLR-17725:
--

[~janhoy]  How do you recommend we proceed here? If you need me to elaborate on 
any part of the design, I am happy to do so (either here or a discussion over 
video chat or whatever is the norm with a new feature). If we need a wider 
audience to take a look at this, I am also happy to float this on the dev list. 

> Automatically upgrade Solr indexes without needing to reindex from source
> -
>
> Key: SOLR-17725
> URL: https://issues.apache.org/jira/browse/SOLR-17725
> Project: Solr
>  Issue Type: Improvement
>Reporter: Rahul Goswami
>Priority: Major
> Attachments: High Level Design.png
>
>
> Today upgrading from Solr version X to X+2 requires complete reingestion of 
> data from source. This comes from Lucene's constraint which only guarantees 
> index compatibility between the version the index was created in and the 
> immediate next version. 
> This reindexing usually comes with added downtime and/or cost. Especially in 
> case of deployments which are in customer environments and not completely in 
> control of the vendor, this proposition of having to completely reindex the 
> data can become a hard sell.
> I, on behalf of my employer, Commvault, have developed a way which achieves 
> this reindexing in-place on the same index. Also, the process automatically 
> keeps "upgrading" the indexes over multiple subsequent Solr upgrades without 
> needing manual intervention. 
> It comes with the following limitations:
> i) All _source_ fields need to be either stored=true or docValues=true. Any 
> copyField destination fields can be stored=false of course, just that the 
> source fields (or more precisely, the source fields you care about 
> preserving) should be either stored or docValues true. 
> ii) The datatype of an existing field in schema.xml shouldn't change upon 
> Solr upgrade. Introducing new fields is fine. 
> For indexes where this limitation is not a problem (it wasn't for us!), the 
> tool can reindex in-place on the same core with zero downtime and 
> legitimately "upgrade" the index. This can remove a lot of operational 
> headaches, especially in environments with hundreds/thousands of very large 
> indexes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Comment Edited] (SOLR-17725) Automatically upgrade Solr indexes without needing to reindex from source

2025-04-24 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-17725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17944396#comment-17944396
 ] 

Rahul Goswami edited comment on SOLR-17725 at 4/25/25 5:36 AM:
---

Will do [~dsmiley] Thanks.

 

[~gus] As far as I can see, the current implementation doesn't run the risk of 
corruption. The status is maintained in two ways:

1) At the core level -> to keep track of which core was being processed when 
the service went down/killed. A file autoupgrade_status.csv is maintained which 
is written each time a core is picked up for processing and a status is set for 
the same. Each time the process resumes it picks up the core with status 
"REINDEXING_ACTIVE" if any. For SolrCloud, this file can be housed in Zookeeper 
. This is an implementation detail I am happy to discuss further, but in our 
(Commvault's)  implementation we recognize the following statuses

            DEFAULT,
            REINDEXING_ACTIVE,
            REINDEXING_PAUSED,
            PROCESSED,
            ERROR,
            CORRECTVERSION

 

2) At the segment level -> This is where we piggyback on Lucene's design and 
it's beautiful! As we iterate over each segment, we read the live docs out of  
the segment, create a SolrInputDocument out of it and reindex using Solr's API. 
This helps achieve two things:

i) A reindexed doc helps mark an existing (old) doc as deleted (when 
auto-commit kicks in). This way if the service goes down, we don't need to 
process the already processed docs of the segment. And if the service goes down 
before a commit could be processed, the small penalty is reprocessing the docs 
of only that segment. 

ii) When a segment is fully processed, Lucene's DeletionPolicy deletes it 
reclaiming space in the process. Hence we never process the same segment again.

Note that as we do this, we are in no way interfering with Lucene's index 
structure directly and only interacting by means of APIs.

 

A combination of these factors helps maintain continuity in the processing of a 
core despite failures, without running the risk of corruption.


was (Author: rahul196...@gmail.com):
Will do [~dsmiley] Thanks.

 

[~gus] As far as I can see, the current implementation doesn't run the risk of 
corruption. The status is maintained in two ways:

1) At the core level -> to keep track of which core was being processed when 
the service went down/killed. A file autoupgrade_status.csv is maintained which 
is written each time a core is picked up for processing and a status is set for 
the same. Each time the process resumes it picks up the core with status 
"REINDEXING_ACTIVE" if any. For SolrCloud, this file can be housed in Zookeeper 
. This is an implementation detail I am happy to discuss further, but in our 
(Commvault's)  implementation we recognize the following statuses

            DEFAULT,
            REINDEXING_ACTIVE,
            REINDEXING_PAUSED,
            PROCESSED,
            ERROR,
            CORRECTVERSION

 

2) At the segment level -> This is where we piggyback on Lucene's design and 
it's beautiful! As we iterate over each segment, we read the live docs out of  
the segment, create a SolrInputDocument out of it and reindex using Solr's API. 
This helps achieve two things:

i) A reindexed doc helps mark an existing (old) doc as deleted (when 
auto-commit kicks in). This way if the service goes down, we don't need to 
process the already processed docs of the segment. And if the service goes down 
before a commit could be processed, the small penalty is reprocessing the docs 
of only that segment. 

ii) When a segment is fully processed, Lucene's DeletionPolicy deletes it 
reclaiming space in the process. Hence we never process the same segment again.

Note that as we do this, we are in no way interfering with Lucene's index 
structure directly and only interacting by means of APIs.

 

A combination of these factors helps maintain continuity in the processing of a 
core despite failures, without running the risk of corruption.

 

 

> Automatically upgrade Solr indexes without needing to reindex from source
> -
>
> Key: SOLR-17725
> URL: https://issues.apache.org/jira/browse/SOLR-17725
> Project: Solr
>  Issue Type: Improvement
>Reporter: Rahul Goswami
>Priority: Major
> Attachments: High Level Design.png
>
>
> Today upgrading from Solr version X to X+2 requires complete reingestion of 
> data from source. This comes from Lucene's constraint which only guarantees 
> index compatibility between the version the index was created in and the 
> immediate next version. 
> This reindexing usually comes with added downtime and/or cost. Especially in 
> case of deployments which are in customer environments and not completely

[jira] [Commented] (SOLR-17725) Automatically upgrade Solr indexes without needing to reindex from source

2025-04-24 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-17725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17947233#comment-17947233
 ] 

Rahul Goswami commented on SOLR-17725:
--

Requested the API from Lucene a few days back and the discussion is underway at 
[https://lists.apache.org/thread/gk3kwplon73llz356szz1mn3myn3nnm3] . 

Was trying to avoid cross posting , but now thinking it might be ok to copy 
d...@solr.apache.org on the discussion(?)

> Automatically upgrade Solr indexes without needing to reindex from source
> -
>
> Key: SOLR-17725
> URL: https://issues.apache.org/jira/browse/SOLR-17725
> Project: Solr
>  Issue Type: Improvement
>Reporter: Rahul Goswami
>Priority: Major
> Attachments: High Level Design.png
>
>
> Today upgrading from Solr version X to X+2 requires complete reingestion of 
> data from source. This comes from Lucene's constraint which only guarantees 
> index compatibility between the version the index was created in and the 
> immediate next version. 
> This reindexing usually comes with added downtime and/or cost. Especially in 
> case of deployments which are in customer environments and not completely in 
> control of the vendor, this proposition of having to completely reindex the 
> data can become a hard sell.
> I, on behalf of my employer, Commvault, have developed a way which achieves 
> this reindexing in-place on the same index. Also, the process automatically 
> keeps "upgrading" the indexes over multiple subsequent Solr upgrades without 
> needing manual intervention. 
> It comes with the following limitations:
> i) All _source_ fields need to be either stored=true or docValues=true. Any 
> copyField destination fields can be stored=false of course, just that the 
> source fields (or more precisely, the source fields you care about 
> preserving) should be either stored or docValues true. 
> ii) The datatype of an existing field in schema.xml shouldn't change upon 
> Solr upgrade. Introducing new fields is fine. 
> For indexes where this limitation is not a problem (it wasn't for us!), the 
> tool can reindex in-place on the same core with zero downtime and 
> legitimately "upgrade" the index. This can remove a lot of operational 
> headaches, especially in environments with hundreds/thousands of very large 
> indexes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-16703) Clearing all documents of an index should delete traces of a previous Lucene version

2025-04-10 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-16703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17941685#comment-17941685
 ] 

Rahul Goswami commented on SOLR-16703:
--

[~gjourdan] The effort is underway on 
https://issues.apache.org/jira/browse/SOLR-17725

> Clearing all documents of an index should delete traces of a previous Lucene 
> version
> 
>
> Key: SOLR-16703
> URL: https://issues.apache.org/jira/browse/SOLR-16703
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 7.6, 8.11.2, 9.1.1
>Reporter: Gaël Jourdan
>Priority: Major
>
> _This is a ticket following a discussion on Slack with_ [~elyograg] _and_ 
> [~wunder] _especially._
> h1. High level scenario
> Assume you're starting from a current Solr server in version 7.x and want to 
> upgrade to 8.x then 9.x.
> Upgrading from 7.x to 8.x works fine. Indexes of 7.x can still be read with 
> Solr 8.x.
> On a regular basis, you clear* the index to start fresh, assuming this will 
> recreate index in version 8.x.
> This run nicely for some time. Then you want to upgrade to 9.x. When 
> starting, you get an error saying that the index is still 7.x and cannot be 
> read by 9.x.
>  
> *This is surprising because you'd expect that starting from a fresh index in 
> 8.x would have removed any trace of 7.x.*
>  
> _* : when I say "clear", I mean "delete by query \{{* : * }}all docs" and 
> then commit + optionally optimize._
> h1. What I'd like to see
> Clearing an index when running Solr version N should delete any trace of 
> Lucene version N-1.
> Otherwise this forces users to delete an index (core / collection) and 
> recreate it rather than just clearing it.
> h1. Detailed scenario to reproduce
> The following steps reproduces the issue with a standalone Solr instance 
> running in Docker but I experienced the issue in SolrCloud mode running on 
> VMs and/or bare-metal.
>  
> Also note that for personal troubleshooting I used the tool "luceneupgrader" 
> available at [https://github.com/hakanai/luceneupgrader] but it's not 
> necessary to reproduce the issue.
>  
> 1. Create a directory for data
> {code:java}
> $ mkdir solrdata
> $ chmod -R a+rwx solrdata {code}
>  
> 2. Start a Solr 7.x server, create a core and push some docs
> {code:java}
> $ docker run -d -v "$PWD/solrdata:/opt/solr/server/solr/mycores:rw" -p 
> 8983:8983 --name my_solr_7 solr:7.6.0 solr-precreate gettingstarted
> $ docker exec -it my_solr_7 post -c gettingstarted 
> example/exampledocs/manufacturers.xml
> $ curl -s 'http://localhost:8983/solr/gettingstarted/select?q=*:*' | jq 
> .response.numFound
> 11{code}
>  
> 3. Look at the index files and check version
> {code:java}
> $ ll solrdata/gettingstarted/data/index                                       
>   
> total 40K
> -rw-r--r--. 1 8983 8983  718 16 mars  17:37 _0.fdt
> -rw-r--r--. 1 8983 8983   84 16 mars  17:37 _0.fdx
> -rw-r--r--. 1 8983 8983  656 16 mars  17:37 _0.fnm
> -rw-r--r--. 1 8983 8983  112 16 mars  17:37 _0_Lucene50_0.doc
> -rw-r--r--. 1 8983 8983 1,1K 16 mars  17:37 _0_Lucene50_0.tim
> -rw-r--r--. 1 8983 8983  145 16 mars  17:37 _0_Lucene50_0.tip
> -rw-r--r--. 1 8983 8983  767 16 mars  17:37 _0_Lucene70_0.dvd
> -rw-r--r--. 1 8983 8983  730 16 mars  17:37 _0_Lucene70_0.dvm
> -rw-r--r--. 1 8983 8983  478 16 mars  17:37 _0.si
> -rw-r--r--. 1 8983 8983  203 16 mars  17:37 segments_2
> -rw-r--r--. 1 8983 8983    0 16 mars  17:36 write.lock
> $ java -jar luceneupgrader-0.6.0.jar info solrdata/gettingstarted/data/index 
> Lucene index version: 7
> {code}
>  
> 4. Stop Solr 7, update solrconfig.xml for Solr 8 and start a Solr 8 server
> {code:java}
> $ docker stop my_solr_7
> $ vim solrdata/gettingstarted/conf/solrconfig.xml
> $ cat solrdata/gettingstarted/conf/solrconfig.xml | grep luceneMatchVersion   
>   8.11.2 
> $ docker run -d -v "$PWD/solrdata:/var/solr/data:rw" -p 8983:8983 --name 
> my_solr_8 solr:8.11.2{code}
>  
> 5. Check index is loaded ok and docs are still there
> {code:java}
> $ curl -s 'http://localhost:8983/solr/gettingstarted/select?q=*:*' | jq 
> .response.numFound
> 11 {code}
>  
> 6. Clear the index and check index files / version
> {code:java}
> $ curl -X POST -H 'Content-Type: application/json' 
> 'http://localhost:8983/solr/gettingstarted/update?commit=true' -d '{ 
> "delete": {"query":"*:*"} }'
> $ ll solrdata/gettingstarted/data/index                                       
> total 4,0K
> -rw-r--r--. 1 8983 8983 135 16 mars  17:45 segments_5
> -rw-r--r--. 1 8983 8983   0 16 mars  17:36 write.lock
> $ java -jar luceneupgrader-0.6.0.jar info solrdata/gettingstarted/data/index
> Lucene index version: 7
> $ curl 'http://localhost:8983/solr/gettingstarted/update?optimize=true'
> $ ll solrdata/gettingstarted/data/index

[jira] [Commented] (SOLR-17725) Automatically upgrade Solr indexes without needing to reindex from source

2025-05-03 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-17725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17949234#comment-17949234
 ] 

Rahul Goswami commented on SOLR-17725:
--

Submitted pull request for the Lucene API change. Fingers crossed!

[https://github.com/apache/lucene/pull/14607]

> Automatically upgrade Solr indexes without needing to reindex from source
> -
>
> Key: SOLR-17725
> URL: https://issues.apache.org/jira/browse/SOLR-17725
> Project: Solr
>  Issue Type: Improvement
>Reporter: Rahul Goswami
>Priority: Major
> Attachments: High Level Design.png
>
>
> Today upgrading from Solr version X to X+2 requires complete reingestion of 
> data from source. This comes from Lucene's constraint which only guarantees 
> index compatibility between the version the index was created in and the 
> immediate next version. 
> This reindexing usually comes with added downtime and/or cost. Especially in 
> case of deployments which are in customer environments and not completely in 
> control of the vendor, this proposition of having to completely reindex the 
> data can become a hard sell.
> I, on behalf of my employer, Commvault, have developed a way which achieves 
> this reindexing in-place on the same index. Also, the process automatically 
> keeps "upgrading" the indexes over multiple subsequent Solr upgrades without 
> needing manual intervention. 
> It comes with the following limitations:
> i) All _source_ fields need to be either stored=true or docValues=true. Any 
> copyField destination fields can be stored=false of course, just that the 
> source fields (or more precisely, the source fields you care about 
> preserving) should be either stored or docValues true. 
> ii) The datatype of an existing field in schema.xml shouldn't change upon 
> Solr upgrade. Introducing new fields is fine. 
> For indexes where this limitation is not a problem (it wasn't for us!), the 
> tool can reindex in-place on the same core with zero downtime and 
> legitimately "upgrade" the index. This can remove a lot of operational 
> headaches, especially in environments with hundreds/thousands of very large 
> indexes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-17758) NumFieldLimiting URP "warnOnly" mode broken

2025-05-09 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-17758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950600#comment-17950600
 ] 

Rahul Goswami commented on SOLR-17758:
--

Thanks for creating the JIRA Jason. Although I do see that due to the reason 
you mentioned, the chain would get terminated irrespective of whether warnOnly 
is true or false, since the user complained of getting a 400 error 
(SolrException.ErrorCode.BAD_REQUEST) the real culprit here seems to be this 
'>' check in init() . It should be ">=" I believe.
https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/update/processor/NumFieldLimitingUpdateRequestProcessorFactory.java#L72



> NumFieldLimiting URP "warnOnly" mode broken
> ---
>
> Key: SOLR-17758
> URL: https://issues.apache.org/jira/browse/SOLR-17758
> Project: Solr
>  Issue Type: Bug
>  Components: UpdateRequestProcessors
>Affects Versions: 9.8.1
>Reporter: Jason Gerlowski
>Priority: Minor
>
> NumFieldLimitingUpdateProcessorFactory (introduced in SOLR-17192) aims to 
> offer a "warnOnly" mode that logs a warning when the maximum number of fields 
> is exceeded.
> But the "warnOnly" code path doesn't trigger any subsequent processors in the 
> chain.  So in effect, both modes will prevent new documents from being added 
> once the limit has been exceeded.
> We should rework this logic so that the warnOnly=true codepath allows 
> documents to be indexed as expected.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-7962) Passing additional arguments to solr.cmd using "-a" does not work on Windows

2025-05-08 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950433#comment-17950433
 ] 

Rahul Goswami commented on SOLR-7962:
-

Sorry for dropping the ball on this. I am able to reproduce this on Windows. 
Passed --jvm-opts "-Dsolr.somerandomproperty=true" and I don't see it in the 
Java properties in Solr Admin UI. Same with --jvm-opts 
"-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=18983".


Working on a fix and a PR.

> Passing additional arguments to solr.cmd using "-a" does not work on Windows
> 
>
> Key: SOLR-7962
> URL: https://issues.apache.org/jira/browse/SOLR-7962
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.3
>Reporter: Dawid Weiss
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Comment Edited] (SOLR-7962) Passing additional arguments to solr.cmd using "-a" does not work on Windows

2025-05-08 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950433#comment-17950433
 ] 

Rahul Goswami edited comment on SOLR-7962 at 5/9/25 4:52 AM:
-

Sorry for dropping the ball on this. I am able to reproduce this on Windows. 
Tried solr start -e techproducts --jvm-opts "-Dsolr.somerandomproperty=true" 
and I don't see it in the Java properties in Solr Admin UI. Same with solr 
start -e techproducts --jvm-opts 
"-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=18983".

Working on a fix and a PR.


was (Author: rahul196...@gmail.com):
Sorry for dropping the ball on this. I am able to reproduce this on Windows. 
Passed --jvm-opts "-Dsolr.somerandomproperty=true" and I don't see it in the 
Java properties in Solr Admin UI. Same with --jvm-opts 
"-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=18983".


Working on a fix and a PR.

> Passing additional arguments to solr.cmd using "-a" does not work on Windows
> 
>
> Key: SOLR-7962
> URL: https://issues.apache.org/jira/browse/SOLR-7962
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.3
>Reporter: Dawid Weiss
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Comment Edited] (SOLR-7962) Passing additional arguments to solr.cmd using "-a" does not work on Windows

2025-05-11 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950745#comment-17950745
 ] 

Rahul Goswami edited comment on SOLR-7962 at 5/11/25 7:19 AM:
--

Thanks for offering to review [~epugh] . The pull request is ready for review. 
Will add the tests next.

"\-e" now works with  "-jvm-opts" on Windows.  Also fixed an edge case issue 
where passing a -D system property with --jvm-opts would break parsing. So now 
passing something like --jvm-opts "-Dsolr.myprops.custom=hello" works. Also 
tested  passing multiple args like --jvm-opts "–Dsolr.myprops.custom=hello 
agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:18983" and that 
works too.

 


was (Author: rahul196...@gmail.com):
Thanks for offering to review [~epugh] . The pull request is ready for review. 
Will add the tests next.

"-e" now works with "--jvm-opts" on Windows.  Also fixed an edge case issue 
where passing a -D system property with --jvm-opts would break parsing. So now 
passing something like --jvm-opts "-Dsolr.myprops.custom=hello" works. Also 
tested  passing multiple args like --jvm-opts "–Dsolr.myprops.custom=hello 
agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:18983" and that 
works too.

 

> Passing additional arguments to solr.cmd using "-a" does not work on Windows
> 
>
> Key: SOLR-7962
> URL: https://issues.apache.org/jira/browse/SOLR-7962
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.3
>Reporter: Dawid Weiss
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Comment Edited] (SOLR-7962) Passing additional arguments to solr.cmd using "-a" does not work on Windows

2025-05-11 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950745#comment-17950745
 ] 

Rahul Goswami edited comment on SOLR-7962 at 5/11/25 7:20 AM:
--

Thanks for offering to review [~epugh] . The pull request is ready for review. 
Will add the tests next.

"-e" now works with  "-jvm-opts" on Windows.  Also fixed an edge case issue 
where passing a -D system property with --jvm-opts would break parsing. So now 
passing something like --jvm-opts "-Dsolr.myprops.custom=hello" works. Also 
tested  passing multiple args like --jvm-opts "-Dsolr.myprops.custom=hello 
agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:18983" and that 
works too.

 


was (Author: rahul196...@gmail.com):
Thanks for offering to review [~epugh] . The pull request is ready for review. 
Will add the tests next.

"\-e" now works with  "-jvm-opts" on Windows.  Also fixed an edge case issue 
where passing a -D system property with --jvm-opts would break parsing. So now 
passing something like --jvm-opts "-Dsolr.myprops.custom=hello" works. Also 
tested  passing multiple args like --jvm-opts "–Dsolr.myprops.custom=hello 
agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:18983" and that 
works too.

 

> Passing additional arguments to solr.cmd using "-a" does not work on Windows
> 
>
> Key: SOLR-7962
> URL: https://issues.apache.org/jira/browse/SOLR-7962
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.3
>Reporter: Dawid Weiss
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-7962) Passing additional arguments to solr.cmd using "-a" does not work on Windows

2025-05-11 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950745#comment-17950745
 ] 

Rahul Goswami commented on SOLR-7962:
-

Thanks for offering to review [~epugh] . The pull request is ready for review. 
Will add the tests next.

"-e" now works with "--jvm-opts" on Windows.  Also fixed an edge case issue 
where passing a -D system property with --jvm-opts would break parsing. So now 
passing something like --jvm-opts "-Dsolr.myprops.custom=hello" works. Also 
tested  passing multiple args like --jvm-opts "–Dsolr.myprops.custom=hello 
agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:18983" and that 
works too.

 

> Passing additional arguments to solr.cmd using "-a" does not work on Windows
> 
>
> Key: SOLR-7962
> URL: https://issues.apache.org/jira/browse/SOLR-7962
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.3
>Reporter: Dawid Weiss
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Comment Edited] (SOLR-7962) Passing additional arguments to solr.cmd using "-a" does not work on Windows

2025-05-11 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950745#comment-17950745
 ] 

Rahul Goswami edited comment on SOLR-7962 at 5/11/25 7:32 AM:
--

Thanks for offering to review [~epugh] . The pull request is ready for review. 
Will add the tests next.

"-e" now works with  "-jvm-opts" on Windows.  For the specific case of remote 
debug config (-agentlib:jdwp=transport=...), cmd.exe was not playing well with 
commons-exec's default way of parsing, passing incorrect/incomplete values to 
start.cmd.

Also fixed an edge case issue where passing a \-D system property as a value 
for --jvm-opts would cause the command to bail. So now passing something like 
--jvm-opts "-Dsolr.myprops.custom=hello" works. Also tested  passing multiple 
args like --jvm-opts "-Dsolr.myprops.custom=hello  
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:18983" and that 
works too.

 


was (Author: rahul196...@gmail.com):
Thanks for offering to review [~epugh] . The pull request is ready for review. 
Will add the tests next.

"-e" now works with  "-jvm-opts" on Windows.  For the specific case of remote 
debug config (-agentlib:jdwp=transport=...), cmd.exe was not playing well with 
commons-exec's default way of parsing, passing incorrect/incomplete values to 
start.cmd. 

Also fixed an edge case issue where passing a -D system property as a value 
for--jvm-opts would cause the command to bail. So now passing something like 
--jvm-opts "-Dsolr.myprops.custom=hello" works. Also tested  passing multiple 
args like --jvm-opts "-Dsolr.myprops.custom=hello  
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:18983" and that 
works too.

 

> Passing additional arguments to solr.cmd using "-a" does not work on Windows
> 
>
> Key: SOLR-7962
> URL: https://issues.apache.org/jira/browse/SOLR-7962
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.3
>Reporter: Dawid Weiss
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Comment Edited] (SOLR-7962) Passing additional arguments to solr.cmd using "-a" does not work on Windows

2025-05-11 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950745#comment-17950745
 ] 

Rahul Goswami edited comment on SOLR-7962 at 5/11/25 7:30 AM:
--

Thanks for offering to review [~epugh] . The pull request is ready for review. 
Will add the tests next.

"-e" now works with  "-jvm-opts" on Windows.  For the specific case of remote 
debug config (-agentlib:jdwp=transport=...), cmd.exe was not playing well with 
commons-exec's default way of parsing, passing incorrect/incomplete values to 
start.cmd. 

Also fixed an edge case issue where passing a -D system property as a value 
for--jvm-opts would cause the command to bail. So now passing something like 
--jvm-opts "-Dsolr.myprops.custom=hello" works. Also tested  passing multiple 
args like --jvm-opts "-Dsolr.myprops.custom=hello  
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:18983" and that 
works too.

 


was (Author: rahul196...@gmail.com):
Thanks for offering to review [~epugh] . The pull request is ready for review. 
Will add the tests next.

"-e" now works with  "-jvm-opts" on Windows.  Also fixed an edge case issue 
where passing a -D system property with --jvm-opts would break parsing. So now 
passing something like --jvm-opts "-Dsolr.myprops.custom=hello" works. Also 
tested  passing multiple args like --jvm-opts "-Dsolr.myprops.custom=hello 
agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:18983" and that 
works too.

 

> Passing additional arguments to solr.cmd using "-a" does not work on Windows
> 
>
> Key: SOLR-7962
> URL: https://issues.apache.org/jira/browse/SOLR-7962
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.3
>Reporter: Dawid Weiss
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Comment Edited] (SOLR-7962) Passing additional arguments to solr.cmd using "-a" does not work on Windows

2025-05-13 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17951086#comment-17951086
 ] 

Rahul Goswami edited comment on SOLR-7962 at 5/13/25 1:42 PM:
--

Interesting find while running (main branch) on Linux. Passing multiple args in 
—jvm-opts as " 
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:18983  
-Dsolr.myprops.custom=hello" doesn’t work. Regression?

Note that this now works on Windows with this fix. Might look into making this 
work for linux when I get a chance.


was (Author: rahul196...@gmail.com):
Interesting find while running (main branch) on Linux. Passing multiple args in 
—jvm-opts as " 
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:18983  
-Dsolr.myprops.custom=hello" doesn’t work. Regression?

 

Note that this now works on Windows with this fix. Might look into making this 
work for linux when I get a chance.

> Passing additional arguments to solr.cmd using "-a" does not work on Windows
> 
>
> Key: SOLR-7962
> URL: https://issues.apache.org/jira/browse/SOLR-7962
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.3
>Reporter: Dawid Weiss
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-7962) Passing additional arguments to solr.cmd using "-a" does not work on Windows

2025-05-12 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17951086#comment-17951086
 ] 

Rahul Goswami commented on SOLR-7962:
-

Interesting find while running (main branch) on Linux. Passing multiple args in 
—jvm-opts as " 
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:18983  
-Dsolr.myprops.custom=hello" doesn’t work. Regression?

 

Note that this now works on Windows with this fix. Might look into making this 
work for linux when I get a chance.

> Passing additional arguments to solr.cmd using "-a" does not work on Windows
> 
>
> Key: SOLR-7962
> URL: https://issues.apache.org/jira/browse/SOLR-7962
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.3
>Reporter: Dawid Weiss
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Comment Edited] (SOLR-7962) Passing additional arguments to solr.cmd using "-a" does not work on Windows

2025-05-14 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17951086#comment-17951086
 ] 

Rahul Goswami edited comment on SOLR-7962 at 5/14/25 1:55 PM:
--

Interesting find while running (main branch) on Linux. Passing multiple args in 
—jvm-opts as " 
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:18983  
-Dsolr.myprops.custom=hello" doesn’t work. Maybe a regression introduced at 
some point in the past?

Note that this now works on Windows with this fix. Might look into making this 
work for linux when I get a chance.


was (Author: rahul196...@gmail.com):
Interesting find while running (main branch) on Linux. Passing multiple args in 
—jvm-opts as " 
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:18983  
-Dsolr.myprops.custom=hello" doesn’t work. Regression?

Note that this now works on Windows with this fix. Might look into making this 
work for linux when I get a chance.

> Passing additional arguments to solr.cmd using "-a" does not work on Windows
> 
>
> Key: SOLR-7962
> URL: https://issues.apache.org/jira/browse/SOLR-7962
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.3
>Reporter: Dawid Weiss
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Created] (SOLR-17772) Tests for examples failing on Windows

2025-05-29 Thread Rahul Goswami (Jira)

Rahul Goswami created SOLR-17772:


 Summary: Tests for examples failing on Windows
 Key: SOLR-17772
 URL: https://issues.apache.org/jira/browse/SOLR-17772
 Project: Solr
  Issue Type: Bug
  Components: cli
Reporter: Rahul Goswami


This change only impacts _*tests*_ on Windows. Post the fix for jvm-opts, 
command line execution runs fine.
The start flow via solr.cmd passes a "--script" parameter (which our tests 
don't) and uses a different executor inside RunExampleTool from what the tests 
use (RunExampleExecutor). Prior to recently merged fix for jvm-opts, because of 
these reasons, the tests on Windows would also try to prepare a command line 
with bin/solr (instead of bin/solr.cmd). Hence those tests would pass getting 
into the "if" block in this PR, although in an unintended way.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-17746) bin/solr always fails if you attempt to use --jettyconfig (aka "-j")

2025-06-25 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-17746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17986145#comment-17986145
 ] 

Rahul Goswami commented on SOLR-17746:
--

[~hossman] FWIW passing multiple space separated args in --jvm-opts as shown 
below does work on Windows post the fix in 
https://issues.apache.org/jira/browse/SOLR-7962 

--jvm-opts " 
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:18983  
-Dsolr.myprops.custom=hello"

I remember it not working on Linux since the parsing in SolrCLI is different, 
but might need to check again.

> bin/solr always fails if you attempt to use --jettyconfig (aka "-j")
> 
>
> Key: SOLR-17746
> URL: https://issues.apache.org/jira/browse/SOLR-17746
> Project: Solr
>  Issue Type: Bug
>Reporter: Chris M. Hostetter
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The "-jettyconfig" optiona (aka "-j") is documented as...
> {noformat}
>   -j   Additional parameters to pass to Jetty when starting 
> Solr.
>   For example, to add configuration folder that jetty 
> should read
>   you could pass: -j 
> "--include-jetty-dir=/etc/jetty/custom/server/"
>   In most cases, you should wrap the additional 
> parameters in double quotes.
> {noformat}
> ..but if you actually attempt to run  use that example option, you will get 
> an error...
> {noformat}
> ./bin/solr start ... -j "--include-jetty-dir=/etc/jetty/custom/server/"
> ERROR: Jetty config is required when using the -j option!
> {noformat}
> IIUC this is because the bash code for parsing this option requires that it 
> not start with a "{{\-}}" character; but by definition any option you want to 
> pass to jetty will start with "{{\--}}".
> Attempting to workaround this problem by using two sets of quotes doesn't 
> seem to work -- the inner quotes are passed verbatim to jetty which seems to 
> prevent jetty from recognizing it as a valid option.
> A workaround that *does* seem to work (in my limited testing) is to include a 
> leading space character _inside_ the quotes...
> {noformat}
> ./bin/solr start ... -j " --include-jetty-dir=/etc/jetty/custom/server/"
> {noformat}
> ...because for some reason that does *NOT* seem to be passed verbatim.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Comment Edited] (SOLR-17746) bin/solr always fails if you attempt to use --jettyconfig (aka "-j")

2025-06-25 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-17746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17986145#comment-17986145
 ] 

Rahul Goswami edited comment on SOLR-17746 at 6/25/25 1:57 PM:
---

[~hossman] FWIW passing multiple space separated args in --jvm-opts as shown 
below **does** work on Windows post the fix in 
https://issues.apache.org/jira/browse/SOLR-7962 

--jvm-opts " 
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:18983  
-Dsolr.myprops.custom=hello"

I remember it not working on Linux since the parsing in SolrCLI is different, 
but might need to check again.


was (Author: rahul196...@gmail.com):
[~hossman] FWIW passing multiple space separated args in --jvm-opts as shown 
below does work on Windows post the fix in 
https://issues.apache.org/jira/browse/SOLR-7962 

--jvm-opts " 
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:18983  
-Dsolr.myprops.custom=hello"

I remember it not working on Linux since the parsing in SolrCLI is different, 
but might need to check again.

> bin/solr always fails if you attempt to use --jettyconfig (aka "-j")
> 
>
> Key: SOLR-17746
> URL: https://issues.apache.org/jira/browse/SOLR-17746
> Project: Solr
>  Issue Type: Bug
>Reporter: Chris M. Hostetter
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The "-jettyconfig" optiona (aka "-j") is documented as...
> {noformat}
>   -j   Additional parameters to pass to Jetty when starting 
> Solr.
>   For example, to add configuration folder that jetty 
> should read
>   you could pass: -j 
> "--include-jetty-dir=/etc/jetty/custom/server/"
>   In most cases, you should wrap the additional 
> parameters in double quotes.
> {noformat}
> ..but if you actually attempt to run  use that example option, you will get 
> an error...
> {noformat}
> ./bin/solr start ... -j "--include-jetty-dir=/etc/jetty/custom/server/"
> ERROR: Jetty config is required when using the -j option!
> {noformat}
> IIUC this is because the bash code for parsing this option requires that it 
> not start with a "{{\-}}" character; but by definition any option you want to 
> pass to jetty will start with "{{\--}}".
> Attempting to workaround this problem by using two sets of quotes doesn't 
> seem to work -- the inner quotes are passed verbatim to jetty which seems to 
> prevent jetty from recognizing it as a valid option.
> A workaround that *does* seem to work (in my limited testing) is to include a 
> leading space character _inside_ the quotes...
> {noformat}
> ./bin/solr start ... -j " --include-jetty-dir=/etc/jetty/custom/server/"
> {noformat}
> ...because for some reason that does *NOT* seem to be passed verbatim.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Comment Edited] (SOLR-17746) bin/solr always fails if you attempt to use --jettyconfig (aka "-j")

2025-06-25 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-17746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17986145#comment-17986145
 ] 

Rahul Goswami edited comment on SOLR-17746 at 6/25/25 1:57 PM:
---

[~hossman] FWIW passing multiple space separated args in --jvm-opts as shown 
below *does* work on Windows post the fix in 
https://issues.apache.org/jira/browse/SOLR-7962 

--jvm-opts " 
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:18983  
-Dsolr.myprops.custom=hello"

I remember it not working on Linux since the parsing in SolrCLI is different, 
but might need to check again.


was (Author: rahul196...@gmail.com):
[~hossman] FWIW passing multiple space separated args in --jvm-opts as shown 
below **does** work on Windows post the fix in 
https://issues.apache.org/jira/browse/SOLR-7962 

--jvm-opts " 
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:18983  
-Dsolr.myprops.custom=hello"

I remember it not working on Linux since the parsing in SolrCLI is different, 
but might need to check again.

> bin/solr always fails if you attempt to use --jettyconfig (aka "-j")
> 
>
> Key: SOLR-17746
> URL: https://issues.apache.org/jira/browse/SOLR-17746
> Project: Solr
>  Issue Type: Bug
>Reporter: Chris M. Hostetter
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The "-jettyconfig" optiona (aka "-j") is documented as...
> {noformat}
>   -j   Additional parameters to pass to Jetty when starting 
> Solr.
>   For example, to add configuration folder that jetty 
> should read
>   you could pass: -j 
> "--include-jetty-dir=/etc/jetty/custom/server/"
>   In most cases, you should wrap the additional 
> parameters in double quotes.
> {noformat}
> ..but if you actually attempt to run  use that example option, you will get 
> an error...
> {noformat}
> ./bin/solr start ... -j "--include-jetty-dir=/etc/jetty/custom/server/"
> ERROR: Jetty config is required when using the -j option!
> {noformat}
> IIUC this is because the bash code for parsing this option requires that it 
> not start with a "{{\-}}" character; but by definition any option you want to 
> pass to jetty will start with "{{\--}}".
> Attempting to workaround this problem by using two sets of quotes doesn't 
> seem to work -- the inner quotes are passed verbatim to jetty which seems to 
> prevent jetty from recognizing it as a valid option.
> A workaround that *does* seem to work (in my limited testing) is to include a 
> leading space character _inside_ the quotes...
> {noformat}
> ./bin/solr start ... -j " --include-jetty-dir=/etc/jetty/custom/server/"
> {noformat}
> ...because for some reason that does *NOT* seem to be passed verbatim.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-17772) Tests for examples failing on Windows

2025-06-12 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-17772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17966778#comment-17966778
 ] 

Rahul Goswami commented on SOLR-17772:
--

[~dsmiley] Yes this can be marked Resolved for 10.

> Tests for examples failing on Windows
> -
>
> Key: SOLR-17772
> URL: https://issues.apache.org/jira/browse/SOLR-17772
> Project: Solr
>  Issue Type: Bug
>  Components: cli
>Reporter: Rahul Goswami
>Priority: Minor
>  Labels: pull-request-available, windows
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This change only impacts _*tests*_ on Windows. Post the fix for jvm-opts, 
> command line execution runs fine.
> The start flow via solr.cmd passes a "--script" parameter (which our tests 
> don't) and uses a different executor inside RunExampleTool from what the 
> tests use (RunExampleExecutor). Prior to recently merged fix for jvm-opts, 
> because of these reasons, the tests on Windows would also try to prepare a 
> command line with bin/solr (instead of bin/solr.cmd). Hence those tests would 
> pass getting into the "if" block in this PR, although in an unintended way.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-17813) Add support for SeededKnnVectorQuery

2025-07-22 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-17813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18009044#comment-18009044
 ] 

Rahul Goswami commented on SOLR-17813:
--

I am working on this. Thanks for the initial draft [~cpoerschke] . I have built 
a good understanding of HNSW and prodding around the current Solr KnnQParser 
and Lucene SeededKnnVectorQuery to continue this effort. 

> Add support for SeededKnnVectorQuery
> 
>
> Key: SOLR-17813
> URL: https://issues.apache.org/jira/browse/SOLR-17813
> Project: Solr
>  Issue Type: New Feature
>  Components: vector-search
>Reporter: Alessandro Benedetti
>Priority: Major
>
> Apache Lucene implemented a version of knn vector query that provides a query 
> seed to initiate the vector search (entry points in the HNSW graph 
> exploration).
> See "Lexically-Accelerated Dense Retrieval"(Hrishikesh Kulkarni, Sean 
> MacAvaney, Nazli Goharian, Ophir Frieder).
> From SIGIR '23: https://arxiv.org/abs/2307.16779 
> With this task, we aim to add to Solr this new query, probably as an 
> additional parameter of the current KNN query parser.
> The only relevant parameter is Query seed
> While the Weight seedWeight is added when rewriting the query, so no special 
> care should be needed there (see 
> org.apache.lucene.search.SeededKnnVectorQuery#rewrite and 
> org.apache.lucene.search.SeededKnnVectorQuery#createSeedWeight)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Comment Edited] (SOLR-17813) Add support for SeededKnnVectorQuery

2025-07-22 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-17813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18009044#comment-18009044
 ] 

Rahul Goswami edited comment on SOLR-17813 at 7/22/25 4:43 PM:
---

I am working on this. Thanks for the initial draft [~cpoerschke] . I have built 
a good understanding of HNSW and LADR, and prodding around the current Solr 
KnnQParser and Lucene SeededKnnVectorQuery to continue this effort. 


was (Author: rahul196...@gmail.com):
I am working on this. Thanks for the initial draft [~cpoerschke] . I have built 
a good understanding of HNSW and prodding around the current Solr KnnQParser 
and Lucene SeededKnnVectorQuery to continue this effort. 

> Add support for SeededKnnVectorQuery
> 
>
> Key: SOLR-17813
> URL: https://issues.apache.org/jira/browse/SOLR-17813
> Project: Solr
>  Issue Type: New Feature
>  Components: vector-search
>Reporter: Alessandro Benedetti
>Priority: Major
>
> Apache Lucene implemented a version of knn vector query that provides a query 
> seed to initiate the vector search (entry points in the HNSW graph 
> exploration).
> See "Lexically-Accelerated Dense Retrieval"(Hrishikesh Kulkarni, Sean 
> MacAvaney, Nazli Goharian, Ophir Frieder).
> From SIGIR '23: https://arxiv.org/abs/2307.16779 
> With this task, we aim to add to Solr this new query, probably as an 
> additional parameter of the current KNN query parser.
> The only relevant parameter is Query seed
> While the Weight seedWeight is added when rewriting the query, so no special 
> care should be needed there (see 
> org.apache.lucene.search.SeededKnnVectorQuery#rewrite and 
> org.apache.lucene.search.SeededKnnVectorQuery#createSeedWeight)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-17813) Add support for SeededKnnVectorQuery

2025-07-25 Thread Rahul Goswami (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-17813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18010082#comment-18010082
 ] 

Rahul Goswami commented on SOLR-17813:
--

Waiting for Solr main to upgrade to Lucene 10.2.x where support for 
SeededKnnVectorQuery was introduced.

> Add support for SeededKnnVectorQuery
> 
>
> Key: SOLR-17813
> URL: https://issues.apache.org/jira/browse/SOLR-17813
> Project: Solr
>  Issue Type: New Feature
>  Components: vector-search
>Reporter: Alessandro Benedetti
>Priority: Major
>
> Apache Lucene implemented a version of knn vector query that provides a query 
> seed to initiate the vector search (entry points in the HNSW graph 
> exploration).
> See "Lexically-Accelerated Dense Retrieval"(Hrishikesh Kulkarni, Sean 
> MacAvaney, Nazli Goharian, Ophir Frieder).
> From SIGIR '23: https://arxiv.org/abs/2307.16779 
> With this task, we aim to add to Solr this new query, probably as an 
> additional parameter of the current KNN query parser.
> The only relevant parameter is Query seed
> While the Weight seedWeight is added when rewriting the query, so no special 
> care should be needed there (see 
> org.apache.lucene.search.SeededKnnVectorQuery#rewrite and 
> org.apache.lucene.search.SeededKnnVectorQuery#createSeedWeight)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

65 matches

Mail list logo