Re: [PR] SOLR-17631: Upgrade main to Lucene 10.x [solr]

2025-02-23 Thread via GitHub


epugh commented on PR #3053:
URL: https://github.com/apache/solr/pull/3053#issuecomment-2676871185

   Looking forward to this landing.  I was just looking at upgrading OpenNLP 
and seeing that it requires Lucene 10...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-17023) Use Modern NLP Models from Apache OpenNLP with Solr

2025-02-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SOLR-17023:
--
Labels: pull-request-available  (was: )

> Use Modern NLP Models from Apache OpenNLP with Solr
> ---
>
> Key: SOLR-17023
> URL: https://issues.apache.org/jira/browse/SOLR-17023
> Project: Solr
>  Issue Type: New Feature
>Reporter: Eric Pugh
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> During the 2023 Halifax Community over Code event we had a hackathon.   
> [~jzemerick] and I experimented with code he wrote a year ago and blogged 
> about at 
> https://opensourceconnections.com/blog/2022/06/27/using-modern-nlp-models-from-apache-opennlp-with-solr/.
> This is to experiment a bit more with this and start getting some feedback 
> from the community on ideas.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17023: Use Modern NLP Models via ONNX and Apache OpenNLP with Solr [solr]

2025-02-23 Thread via GitHub


epugh commented on PR #1999:
URL: https://github.com/apache/solr/pull/1999#issuecomment-2676875213

   I updated this PR with main, to see what happened, and we're closer.
Things that I think are still holding us back:
   
   1) Solr Main is NOT on Lucene 10, which means we have to override the 
version of OpenNLP that Lucene uses (maybe somehow?).   However when #3053 gets 
in that should deal with it.
   2) Need to write a JUnit test for 
`DocumentCategorizerUpdateProcessorFactory` (oops!)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [I] elegant way to enable cross origin resource sharing? [solr-operator]

2025-02-23 Thread via GitHub


ibraheemalayan commented on issue #513:
URL: https://github.com/apache/solr-operator/issues/513#issuecomment-2676883924

   For anyone interested in a guide.
   
   1. Create a custom `web.xml`
   
   - Get the default web.xml from solr's repository or from your container 
at `/opt/solr-9.7.0/server/solr-webapp/webapp/WEB-INF/web.xml` ( make sure to 
put the correct version )
   - follow https://laurenthinoul.com/how-to-enable-cors-in-solr/ to do the 
required changes

   2. Create a ConfigMap that contains the new web.xml file. You can do this by 
running the following command:
   
   ```bash
   kubectl create configmap custom-web-xml 
--from-file=web.xml={{path/to/your/web.xml on machine running kubectl}}
   ```
   
   3. Attach a volume to solr containers
   
   In SolrCloud resource file, create a new volume under 
`SolrCloud.spec.customSolrKubeOptions.podOptions.volumes` as follows:
   
   ```yaml
   # specs: 
https://apache.github.io/solr-operator/docs/solr-cloud/solr-cloud-crd.html
   apiVersion: solr.apache.org/v1beta1
   kind: SolrCloud
   metadata:
 name: search-cluster
   spec:
 # 
 customSolrKubeOptions:
   podOptions:
 # override /opt/solr-9.7.0/server/solr-webapp/webapp/WEB-INF/web.xml 
with custom web.xml to enable CORS
 volumes:
   - name: custom-web-xml-volume
 source:
   configMap:
 name: custom-web-xml
 items:
   - key: web.xml
 path: web.xml
 defaultContainerMount:
   name: custom-web-xml-volume
   mountPath: 
/opt/solr-9.7.0/server/solr-webapp/webapp/WEB-INF/web.xml # ! match solr version
   subPath: web.xml
   readOnly: true
 # 
 # rest of the file 
 replicas: 2
 solrJavaMem: -Xms512M -Xmx1G
 solrImage:
   tag: 9.7.0
   ```
   
   4. Apply changes
   5. Verify
   
   get one of your pods' names, and verify that it has the new xml file:
   ```bash
   kubectl exec -it  -- cat 
/opt/solr-9.7.0/server/solr-webapp/webapp/WEB-INF/web.xml
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [I] elegant way to enable cross origin resource sharing? [solr-operator]

2025-02-23 Thread via GitHub


epugh commented on issue #513:
URL: https://github.com/apache/solr-operator/issues/513#issuecomment-2676886251

   We *really* need a better way to support CORS than hacking web.xml.We 
need some environment or solr.xml or something settings!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-16903: Switch CoreContainer#getSolrHome to return Path instead of String [solr]

2025-02-23 Thread via GitHub


AndreyBozhko commented on PR #3204:
URL: https://github.com/apache/solr/pull/3204#issuecomment-2676985912

   Sounds good, I updated the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17635: unmarshalling Map to SimpleOrderedMap if key i of type St… [solr]

2025-02-23 Thread via GitHub


dsmiley commented on code in PR #3163:
URL: https://github.com/apache/solr/pull/3163#discussion_r1966955662


##
solr/solrj/src/java/org/apache/solr/common/util/Utils.java:
##
@@ -290,7 +290,7 @@ public static Object fromJSON(byte[] utf8) {
   }
 
   public static Object fromJSON(byte[] utf8, int offset, int length) {
-return fromJSON(utf8, offset, length, STANDARDOBJBUILDER);
+return fromJSON(utf8, offset, length, SIMPLEORDEREDMAPOBJBUILDER);

Review Comment:
   This must be changed back; I don't think it makes sense to deserialize JSON 
and get a SimpleOrderedMap in there.  I debugged and the root of the problem is 
that SimpleOrderedMap is not actually fully implementing the contract of a Map 
with regards to equals & hashCode.  I'll push some changes very soon.  This 
part should be extracted to it's own PR against the same issue of 
SimpleOrderedMap implementing a Map.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-17669) SolrJ getBeans Field with wildcard: reduce memory

2025-02-23 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-17669:

Summary: SolrJ getBeans Field with wildcard: reduce memory  (was: Reduce 
Memory Consumption when using Dynamic Fields)

> SolrJ getBeans Field with wildcard: reduce memory
> -
>
> Key: SOLR-17669
> URL: https://issues.apache.org/jira/browse/SOLR-17669
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Affects Versions: 9.8
>Reporter: Peter Kroiss
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.8.1
>
> Attachments: Screenshot 2025-02-11 at 11.00.15.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Replace the Pattern.matcher for DynamicFields with standard String Operations.
> In our Environment that fix reduced the Memory Overhead when mapping the 
> Objects by 80-90% (see Screenshot)
> Pull Request will be opened created by us. Would be great to fix in Solr 9.8.1
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[PR] JavaBinUpdateRequestCodec: Dead code? [solr]

2025-02-23 Thread via GitHub


dsmiley opened a new pull request, #3207:
URL: https://github.com/apache/solr/pull/3207

   While working on something nearby, I noticed that 
`JavaBinUpdateRequestCodec.readOuterMostDocIterator` (server side, processes an 
update request that's JavaBin formatted) reads the outer most object checking 
for a variety of possibilities.  I'm suspicious of them so added some asserts 
and found out when some were added.
   https://issues.apache.org/jira/browse/SOLR-2904 weird
   https://issues.apache.org/jira/browse/SOLR-13731 this one is actually tested 
but not realistically to how users use javabin.
   
   Even if we just end up leaving some comments, it'd be helpful.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[PR] SOLR-17644: Fix missing auth listener [solr]

2025-02-23 Thread via GitHub


iamsanjay opened a new pull request, #3208:
URL: https://github.com/apache/solr/pull/3208

   https://issues.apache.org/jira/browse/SOLR-17644
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://github.com/apache/solr/blob/main/CONTRIBUTING.md) and my 
code conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended, not available for 
branches on forks living under an organisation)
   - [x] I have developed this patch against the `main` branch.
   - [x] I have run `./gradlew check`.
   - [] I have added tests for my changes.
   - [ ] I have added documentation for the [Reference 
Guide](https://github.com/apache/solr/tree/main/solr/solr-ref-guide)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-17644) Collection creation fails when replica placement plugin and basic auth are enabled

2025-02-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SOLR-17644:
--
Labels: pull-request-available  (was: )

> Collection creation fails when replica placement plugin and basic auth are 
> enabled
> --
>
> Key: SOLR-17644
> URL: https://issues.apache.org/jira/browse/SOLR-17644
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: main (10.0), 9.8
>Reporter: Colvin Cowie
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In Solr 9.8 and latest, collection creation fails when the replica placement 
> plugin and authentication are both configured. See 
> [https://the-asf.slack.com/archives/CEKUCUNE9/p1738152813677179] for more info
>  
> Stacktrace:
> {quote}2025-01-29 11:42:40.638 WARN (OverseerThreadFactory-22-thread-1) 
> [c:main_index s: r: x: t:] o.a.s.c.s.i.SolrClientNodeStateProvider could not 
> get tags from node localhost:8983_solr => 
> org.apache.solr.client.solrj.impl.BaseHttpSolrClient$RemoteSolrException: 
> Error from server at
> http://localhost:8983/solr/admin/metrics
> : Expected mime type in [application/octet-stream, 
> application/vnd.apache.solr.javabin] but got text/html. 
> 
> 
> Error 401 require authentication
> 
> HTTP ERROR 401 require authentication
> 
> URI:/solr/admin/metrics
> STATUS:401
> MESSAGE:require authentication
> SERVLET:default
> 
> 
> 
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClientBase.checkContentType(HttpSolrClientBase.java:341)
> org.apache.solr.client.solrj.impl.BaseHttpSolrClient$RemoteSolrException: 
> Error from server at
> http://localhost:8983/solr/admin/metrics
> : Expected mime type in [application/octet-stream, 
> application/vnd.apache.solr.javabin] but got text/html. 
> 
> 
> Error 401 require authentication
> 
> HTTP ERROR 401 require authentication
> 
> URI:/solr/admin/metrics
> STATUS:401
> MESSAGE:require authentication
> SERVLET:default
> 
> 
> 
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClientBase.checkContentType(HttpSolrClientBase.java:341)
>  ~[?:?]
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClientBase.processErrorsAndResponse(HttpSolrClientBase.java:227)
>  ~[?:?]
> at 
> org.apache.solr.client.solrj.impl.Http2SolrClient.processErrorsAndResponse(Http2SolrClient.java:621)
>  ~[?:?]
> at 
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:542)
>  ~[?:?]
> at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:279) 
> ~[?:?]
> at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:295) 
> ~[?:?]
> at 
> org.apache.solr.client.solrj.impl.Http2SolrClient.requestWithBaseUrl(Http2SolrClient.java:604)
>  ~[?:?]
> at 
> org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider$RemoteCallCtx.invoke(SolrClientNodeStateProvider.java:292)
>  ~[?:?]
> at 
> org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider$RemoteCallCtx.invokeWithRetry(SolrClientNodeStateProvider.java:255)
>  ~[?:?]
> at 
> org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider.fetchReplicaMetrics(SolrClientNodeStateProvider.java:190)
>  ~[?:?]
> at 
> org.apache.solr.client.solrj.impl.NodeValueFetcher.getRemotePropertiesAndMetrics(NodeValueFetcher.java:125)
>  ~[?:?]
> at 
> org.apache.solr.client.solrj.impl.NodeValueFetcher.getTags(NodeValueFetcher.java:187)
>  ~[?:?]
> at 
> org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider.fetchTagValues(SolrClientNodeStateProvider.java:114)
>  ~[?:?]
> at 
> org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider.getNodeValues(SolrClientNodeStateProvider.java:106)
>  ~[?:?]
> at 
> org.apache.solr.cluster.placement.impl.AttributeFetcherImpl.fetchAttributes(AttributeFetcherImpl.java:149)
>  ~[?:?]
> at 
> org.apache.solr.cluster.placement.plugins.AffinityPlacementFactory$AffinityPlacementPlugin.getBaseWeightedNodes(AffinityPlacementFactory.java:284)
>  ~[?:?]
> at 
> org.apache.solr.cluster.placement.plugins.OrderedNodePlacementPlugin.getWeightedNodes(OrderedNodePlacementPlugin.java:316)
>  ~[?:?]
> at 
> org.apache.solr.cluster.placement.plugins.OrderedNodePlacementPlugin.computePlacements(OrderedNodePlacementPlugin.java:88)
>  ~[?:?]
> at 
> org.apache.solr.cluster.placement.impl.PlacementPluginAssignStrategy.assign(PlacementPluginAssignStrategy.java:84)
>  ~[?:?]
> at 
> org.apache.solr.cloud.api.collections.Assign$AssignStrategy.assign(Assign.java:432)
>  ~[?:?]
> at 
> org.apache.solr.cloud.api.collections.CreateCollectionCmd.buildReplicaPositions(CreateCollectionCmd.java:537)
>  ~[?:?]
> at 
> org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:234)
>  ~[?:?]
>

Re: [PR] SOLR-17644: Fix missing auth listener [solr]

2025-02-23 Thread via GitHub


iamsanjay commented on PR #3208:
URL: https://github.com/apache/solr/pull/3208#issuecomment-2677546452

   This is not the final solution. I created this to initiate discussion on the 
best possible approach to solving this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17644: Fix missing auth listener [solr]

2025-02-23 Thread via GitHub


iamsanjay commented on PR #3208:
URL: https://github.com/apache/solr/pull/3208#issuecomment-2677547561

   Test case!
   
   ```
   @Test
 public void testAuth() throws Exception {
   // This succeeds with auth enabled
   CollectionAdminRequest.createCollection("c123", "conf", 1, 1)
   .setBasicAuthCredentials(SecurityJson.USER, SecurityJson.PASS)
   .process(cluster.getSolrClient());
   cluster.waitForActiveCollection("c123", 1, 1);
   
   // Configure placement plugin
   PluginMeta plugin = new PluginMeta();
   plugin.name = PlacementPluginFactory.PLUGIN_NAME;
   plugin.klass = MinimizeCoresPlacementFactory.class.getName();
   V2Request v2Request =
   new V2Request.Builder("/cluster/plugin")
   .forceV2(true)
   .POST()
   .withPayload(singletonMap("add", plugin))
   .build();
   v2Request.setBasicAuthCredentials(SecurityJson.USER, SecurityJson.PASS);
   v2Request.process(cluster.getSolrClient());
   
   // Now this will fail with a 401 !!
   CollectionAdminRequest.createCollection("c456", "conf", 1, 1)
   .setBasicAuthCredentials(SecurityJson.USER, SecurityJson.PASS)
   .process(cluster.getSolrClient());
   cluster.waitForActiveCollection("c456", 1, 1);
   /// /
 }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] Optimize Gradle tasks and configurations [solr]

2025-02-23 Thread via GitHub


dsmiley commented on PR #3191:
URL: https://github.com/apache/solr/pull/3191#issuecomment-2677562902

   How much time does this save?  It'd have to be measured being mindful of 
Gradle's caching.
   Note that I hear Develocity can analyze gradle "build scans" to find various 
things to optimize.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17644: Fix missing auth listener [solr]

2025-02-23 Thread via GitHub


iamsanjay commented on PR #3208:
URL: https://github.com/apache/solr/pull/3208#issuecomment-2677628347

   https://github.com/user-attachments/assets/2c5da665-2087-464d-b7b4-ae95ede9ce56";
 />
   
   It's not about the lazy loading. `SolrClientProvider` object is being used 
inside the `ZkController#getSolrCloudManager()` but at that time auth listener 
is not available. 
   
   zkSys.initZooKeeper(this, cfg.getCloudConfig()); is what triggers somewhere 
`ZkController#getSolrCloudManager()` 
   
   But as you can see in the IF block we initialize the 
`pkiAuthenticationSecurityBuilder` after that.
   
   
https://github.com/apache/solr/blob/76c09a35dba42913a6bcb281b52b00f87564624a/solr/core/src/java/org/apache/solr/core/CoreContainer.java#L868-L880
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-17679) Request for Documentation/Feature Improvement on Hybrid Lexical and Vector Search with Score Breakdown and Cutoff Logic

2025-02-23 Thread Khaled Alkhouli (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khaled Alkhouli updated SOLR-17679:
---
Priority: Minor  (was: Major)

> Request for Documentation/Feature Improvement on Hybrid Lexical and Vector 
> Search with Score Breakdown and Cutoff Logic
> ---
>
> Key: SOLR-17679
> URL: https://issues.apache.org/jira/browse/SOLR-17679
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 9.6.1
>Reporter: Khaled Alkhouli
>Priority: Minor
>  Labels: hybrid-search, search, solr, vector-based-search
> Attachments: Screenshot from 2025-02-20 16-31-48.png
>
>
> Hello Apache Solr team,
> I was able to implement a hybrid search engine that combines *lexical search 
> (edismax)* and *vector search (KNN-based embeddings)* within a single 
> request. The idea is simple:
>  * *Lexical Search* retrieves results based on text relevance.
>  * *Vector Search* retrieves results based on semantic similarity.
>  * *Hybrid Scoring* sums both scores, where a missing score (if a document 
> appears in only one search) should be treated as zero.
> This approach is working, but *there is a critical lack of documentation* on 
> how to properly return individual score components of lexical search (score1) 
> vs. vector search (score2 from cosine similarity). Right now, Solr only 
> returns the final combined score, but there is no clear way to see {*}how 
> much of that score comes from lexical search vs. vector search{*}. This is 
> essential for debugging and for fine-tuning ranking strategies.
>  
> I have implemented the following logic using Python:
> {code:java}
> def hybrid_search(query, top_k=10):
>     embedding = np.array(embed([query]), dtype=np.float32
> embedding = list(embedding[0])
>     lxq= rf"""{{!type=edismax 
>                 qf='text'
>                 q.op=OR
>                 tie=0.1
>                 bq=''
>                 bf=''
>                 boost=''
>             }}({query})"""
>     solr_query = {"params": {
>         "q": "{!bool filter=$retrievalStage must=$rankingStage}",
>         "rankingStage": 
> "{!func}sum(query($normalisedLexicalQuery),query($vectorQuery))",
>         "retrievalStage":"{!bool should=$lexicalQuery should=$vectorQuery}", 
> # Union
>         "normalisedLexicalQuery": "{!func}scale(query($lexicalQuery),0,1)",
>         "lexicalQuery": lxq,
>         "vectorQuery": f"{{!knn f=all_v512 topK={top_k}}}{embedding}",
>         "fl": "text",
>         "rows": top_k,
>         "fq": [""],
>         "rq": "{!rerank reRankQuery=$rqq reRankDocs=100 reRankWeight=3}",
>         "rqq": "{!frange l=$cutoff}query($rankingStage)",
>         "sort": "score desc",
>     }}
>     response = requests.post(SOLR_URL, headers=HEADERS, json=solr_query)
>     response = response.json()
>     return response {code}
> h3. *Issues & Missing Documentation*
>  # *No Way to Retrieve Individual Scores in a Hybrid Search*
> There is no clear documentation on how to return:
>  * 
>  ** The *lexical search score* separately.
>  ** The *vector search score* separately.
>  ** The *final combined score* (which Solr already provides).
> Right now, we’re left guessing whether the sum of these scores works as 
> expected, making debugging and tuning unnecessarily difficult.
>  # *No Clear Way to Implement Cutoff Logic in Solr*
> In a hybrid search, I need to filter out results that don’t meet a {*}minimum 
> score threshold{*}. Right now, I have to implement this in Python, {*}which 
> defeats the purpose of using Solr for ranking in the first place{*}.
>  * 
>  ** How can we enforce a {*}score-based cutoff directly in Solr{*}, without 
> external filtering?
>  ** The \{!frange} function is mentioned in the documentation but lacks 
> {*}clear examples on how to apply it to hybrid search{*}.
> h3. *Feature Request / Documentation Improvement*
>  * *Provide a way to return individual scores for lexical and vector search 
> in the response.* This should be as simple as adding fields like 
> {{{}fl=score,lexical_score,vector_score{}}}.
>  * *Clarify how to apply cutoff logic in a hybrid search.* This is an 
> essential ranking mechanism, and yet, there’s little guidance on how to do 
> this efficiently within Solr itself.
> Looking forward to a response.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17670) Fix unnecessary memory allocation caused by a large reRankDocs param

2025-02-23 Thread JiaBaoGao (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17929443#comment-17929443
 ] 

JiaBaoGao commented on SOLR-17670:
--

Thank you, David.

> Fix unnecessary memory allocation caused by a large reRankDocs param
> 
>
> Key: SOLR-17670
> URL: https://issues.apache.org/jira/browse/SOLR-17670
> Project: Solr
>  Issue Type: Bug
>Reporter: JiaBaoGao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.8.1
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The reRank function has a reRankDocs parameter that specifies the number of 
> documents to re-rank. I've observed that increasing this parameter to test 
> its performance impact causes queries to become progressively slower. Even 
> when the parameter value exceeds the total number of documents in the index, 
> further increases continue to slow down the query, which is counterintuitive.
>  
> Therefore, I investigated the code:
>  
> For a query containing re-ranking, such as:
> {code:java}
> {
> "start": "0",
> "rows": 10,
> "fl": "ID,score",
> "q": "*:*",
> "rq": "{!rerank reRankQuery='{!func} 100' reRankDocs=10 
> reRankWeight=2}"
> } {code}
>  
> The current execution logic is as follows:
> 1. Perform normal retrieval using the q parameter.
> 2. Re-score all documents retrieved in the q phase using the rq parameter.
>  
> During the retrieval in phase 1 (using q), a TopScoreDocCollector is created. 
> Underneath, this creates a PriorityQueue which contains an Object[]. The 
> length of this Object[] continuously increases with reRankDocs without any 
> limit. 
>  
> On my local test cluster with limited JVM memory, this can even trigger an 
> OOM, causing the Solr node to crash. I can also reproduce the OOM situation 
> using the SolrCloudTestCase unit test. 
>  
> I think limiting the length of the Object[] array using 
> searcher.getIndexReader().maxDoc() at ReRankCollector would resolve this 
> issue. This way, when reRankDocs exceeds maxDoc, memory allocation will not 
> continue to increase indefinitely. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-16903: Switch CoreContainer#getSolrHome to return Path instead of String [solr]

2025-02-23 Thread via GitHub


dsmiley commented on code in PR #3204:
URL: https://github.com/apache/solr/pull/3204#discussion_r1966904872


##
solr/core/src/java/org/apache/solr/handler/admin/SystemInfoHandler.java:
##
@@ -145,7 +145,7 @@ public void handleRequestBody(SolrQueryRequest req, 
SolrQueryResponse rsp) throw
   rsp.add("zkHost", 
getCoreContainer(req).getZkController().getZkServerAddress());
 }
 if (cc != null) {
-  rsp.add("solr_home", cc.getSolrHome());
+  rsp.add("solr_home", cc.getSolrHome().toString());

Review Comment:
   Good eye!  I agree with removing toString here.  Whether toAbsolutePath 
should be called or not is another question from this PR; I'm ambivalent.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Closed] (SOLR-15732) queries to missing collection are slow

2025-02-23 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley closed SOLR-15732.
---

> queries to missing collection are slow
> --
>
> Key: SOLR-15732
> URL: https://issues.apache.org/jira/browse/SOLR-15732
> Project: Solr
>  Issue Type: Improvement
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Minor
> Fix For: 9.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Queries to missing collection try to refresh aliases and collection list from 
> ZK and is unnecessarily slow



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Resolved] (SOLR-15732) queries to missing collection are slow

2025-02-23 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved SOLR-15732.
-
Fix Version/s: 9.0
   Resolution: Fixed

One PR was merged; should be marked resolved.  Doing that now; it appears this 
landed in 9.0 without an entry in CHANGES.txt

> queries to missing collection are slow
> --
>
> Key: SOLR-15732
> URL: https://issues.apache.org/jira/browse/SOLR-15732
> Project: Solr
>  Issue Type: Improvement
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Minor
> Fix For: 9.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Queries to missing collection try to refresh aliases and collection list from 
> ZK and is unnecessarily slow



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-13731) javabin must support a 1:1 mapping of the JSON update format

2025-02-23 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17929568#comment-17929568
 ] 

David Smiley commented on SOLR-13731:
-

I don't get the point.  Unless I'm missing something, JavaBinUpdateRequestCodec 
only needs to support SolrJ (UpdateRequest). UpdateRequest basically just sends 
a SolrInputDocument (or Map.Entry with the same but I digress), so when would 
we ever actually receive a Map form of a doc?  Would users actually use the 
lower level JavaBinCodec class directly to send a Map, as TestCborDataFormat 
does?  The relevant code is 
{{org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.StreamingCodec#readOuterMostDocIterator}}

> javabin  must support a 1:1 mapping of the JSON update format
> -
>
> Key: SOLR-13731
> URL: https://issues.apache.org/jira/browse/SOLR-13731
> Project: Solr
>  Issue Type: Task
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
> Fix For: 8.4
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Objects like SolrInputDocument is serialized in such a way that the size is 
> known in advance. All objects should ideally support streaming friendly types.
> This is backward compatible . basically javabin will continue to serialize 
> using the old format , but will accept more efficient formats as input



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org