Re: [PR] SOLR-17649: Fix Json faceting on multivalue number types [solr]
thomaswoeckinger commented on PR #3158: URL: https://github.com/apache/solr/pull/3158#issuecomment-2636198561 @dsmiley @gerlowskija May you know who is responsible for this part of faceting code, to get a reviewer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Updated] (SOLR-17649) Multivalue facets on enum field type returns empty result when using JsonFacet API
[ https://issues.apache.org/jira/browse/SOLR-17649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Wöckinger updated SOLR-17649: Summary: Multivalue facets on enum field type returns empty result when using JsonFacet API (was: Multivalue facets on enum field type returns empty result when using JsonFacet) > Multivalue facets on enum field type returns empty result when using > JsonFacet API > -- > > Key: SOLR-17649 > URL: https://issues.apache.org/jira/browse/SOLR-17649 > Project: Solr > Issue Type: Bug > Components: Facet Module, FacetComponent, faceting >Affects Versions: 9.4, 9.5, 9.4.1, 9.6, 9.7, 9.6.1, 9.8 >Reporter: Thomas Wöckinger >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > When using JsonFacet API on a multivalued EnumFieldType with facet method > 'enum' the {color:#232629}FacetFieldProcessorByArrayDV{color} will be used. > At line 96 ({color:#232629}FacetFieldProcessorByArrayDV.java){color} there is > a check about allBuckets or missing buckets, which simply skip collect > process. > So at the moment there is no support for Multivalue faceting on facets which > are using FacetFieldProcessorByArrayDV facet processor for collecting facets. > Tested this behavior from 9.4 onwards, this feature was working on the 8.x > releases. > Multivalue faceting on EnumFieldType should be supported again. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-17649) Multivalue facets on enum field type returns empty result when using JsonFacet
[ https://issues.apache.org/jira/browse/SOLR-17649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17923979#comment-17923979 ] Thomas Wöckinger commented on SOLR-17649: - [~dsmiley] [ |https://github.com/dsmiley] [~gerlowskija] [ |https://github.com/gerlowskija] May you know who is responsible for this part of faceting code, to get a reviewer for it. > Multivalue facets on enum field type returns empty result when using JsonFacet > -- > > Key: SOLR-17649 > URL: https://issues.apache.org/jira/browse/SOLR-17649 > Project: Solr > Issue Type: Bug > Components: Facet Module, FacetComponent, faceting >Affects Versions: 9.4, 9.5, 9.4.1, 9.6, 9.7, 9.6.1, 9.8 >Reporter: Thomas Wöckinger >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > When using JsonFacet API on a multivalued EnumFieldType with facet method > 'enum' the {color:#232629}FacetFieldProcessorByArrayDV{color} will be used. > At line 96 ({color:#232629}FacetFieldProcessorByArrayDV.java){color} there is > a check about allBuckets or missing buckets, which simply skip collect > process. > So at the moment there is no support for Multivalue faceting on facets which > are using FacetFieldProcessorByArrayDV facet processor for collecting facets. > Tested this behavior from 9.4 onwards, this feature was working on the 8.x > releases. > Multivalue faceting on EnumFieldType should be supported again. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17351: Decompose filestore "get file" API [solr]
gerlowskija commented on code in PR #3047: URL: https://github.com/apache/solr/pull/3047#discussion_r1943253094 ## solr/core/src/java/org/apache/solr/filestore/NodeFileStore.java: ## @@ -76,135 +66,54 @@ public SolrJerseyResponse getFile(String path, Boolean sync, String getFrom, Boo final var response = instantiateJerseyResponse(SolrJerseyResponse.class); if (Boolean.TRUE.equals(sync)) { - try { -fileStore.syncToAllNodes(path); -return response; - } catch (IOException e) { -throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Error getting file ", e); - } + ClusterFileStore.syncToAllNodes(fileStore, path); Review Comment: Alright, the most recent iteration removes NodeFileStore/NodeFileStoreApi. There's still some room for confusion between ClusterFileStoreApi, ClusterFileStore, and DistribFileStore, but overall this PR leaves things better than it found it in that regard 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]
epugh commented on code in PR #3151: URL: https://github.com/apache/solr/pull/3151#discussion_r1943278670 ## solr/test-framework/src/java/org/apache/solr/util/RestTestBase.java: ## @@ -88,13 +88,33 @@ private static void checkUpdateU(String message, String update, boolean shouldSu if (response != null) fail(m + "update was not successful: " + response); } else { String response = restTestHarness.validateErrorUpdate(update); -if (response != null) fail(m + "update succeeded, but should have failed: " + response); +if (response == null) fail(m + "update succeeded, but should have failed: " + response); } } catch (SAXException e) { throw new RuntimeException("Invalid XML", e); } } + public static void checkUpdateU(String update, String... tests) { Review Comment: Not specific per se to this, but I wish we had a clearer plan about the future of RestTestBase. ARe we embracing it? ## solr/modules/llm/src/java/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessor.java: ## @@ -0,0 +1,100 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.llm.texttovector.update.processor; + +import org.apache.solr.common.SolrException; +import org.apache.solr.common.SolrInputDocument; +import org.apache.solr.common.SolrInputField; +import org.apache.solr.llm.texttovector.model.SolrTextToVectorModel; +import org.apache.solr.llm.texttovector.store.rest.ManagedTextToVectorModelStore; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.update.AddUpdateCommand; +import org.apache.solr.update.processor.UpdateRequestProcessor; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.lang.invoke.MethodHandles; +import java.util.ArrayList; +import java.util.List; + + +class TextToVectorUpdateProcessor extends UpdateRequestProcessor { +private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); + +private final String inputField; +private final String outputField; +private final String model; +private SolrTextToVectorModel textToVector; +private ManagedTextToVectorModelStore modelStore = null; + +public TextToVectorUpdateProcessor( +String inputField, +String outputField, +String model, +SolrQueryRequest req, +UpdateRequestProcessor next) { +super(next); +this.inputField = inputField; +this.outputField = outputField; +this.model = model; +this.modelStore = ManagedTextToVectorModelStore.getManagedModelStore(req.getCore()); +} + +/** + * @param cmd the update command in input containing the Document to process + * @throws IOException If there is a low-level I/O error + */ +@Override +public void processAdd(AddUpdateCommand cmd) throws IOException { +this.textToVector = modelStore.getModel(model); +if (textToVector == null) { +throw new SolrException( +SolrException.ErrorCode.BAD_REQUEST, +"The model requested '" ++ model ++ "' can't be found in the store: " ++ ManagedTextToVectorModelStore.REST_END_POINT); +} + +SolrInputDocument doc = cmd.getSolrInputDocument(); +SolrInputField inputFieldContent = doc.get(inputField); +if (!isNullOrEmpty(inputFieldContent, doc, inputField)) { +String textToVectorise = inputFieldContent.getValue().toString();//add null checks and +float[] vector = textToVector.vectorise(textToVectorise); Review Comment: I was chatting with @iamsanjay this morning, and I was expounding on the thought that a lot of folks might want to first index the document with just the core text/string/numbers, and then, since enrichment is SLOW, come back with a streaming expression and do things like vectorization, and an atomic update.. that way you pump your data in as fast as possible, and then enrich at your leisure...This model cons
[jira] [Commented] (SOLR-13681) make Lucene's index sorting directly configurable in Solr
[ https://issues.apache.org/jira/browse/SOLR-13681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17923973#comment-17923973 ] Christine Poerschke commented on SOLR-13681: bq. ticket cross-referencing: SOLR-12239 details how enabling of index sorting for an existing collection is problematic. if (speculation) LUCENE-9484 perhaps solves that then that would also be beneficial for the changes here it seems. I don't know if the enabling of an existing collection remains problematic or not. However, if it were, might perhaps "Solr 10" provide an "opportunity" somehow e.g. to deprecated the SortingMergePolicy in 9.x and remove it in 10x and then to have index sorting directly configurable from 10 onwards only? > make Lucene's index sorting directly configurable in Solr > - > > Key: SOLR-13681 > URL: https://issues.apache.org/jira/browse/SOLR-13681 > Project: Solr > Issue Type: New Feature >Reporter: Christine Poerschke >Priority: Minor > Attachments: SOLR-13681-refguide-skel.patch, SOLR-13681.patch > > Time Spent: 3h 10m > Remaining Estimate: 0h > > History/Background: > * SOLR-5730 made Lucene's SortingMergePolicy and > EarlyTerminatingSortingCollector configurable in Solr 6.0 or later. > * LUCENE-6766 make index sorting a first-class citizen in Lucene 6.2 or later. > Current status: > * In Solr 8.2 use of index sorting is only available via configuration of a > (top-level) merge policy that is a SortingMergePolicy and that policy's sort > is then passed to the index writer config via the > {code} > if (mergePolicy instanceof SortingMergePolicy) { > Sort indexSort = ((SortingMergePolicy) mergePolicy).getSort(); > iwc.setIndexSort(indexSort); > } > {code} > https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.2.0/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L241-L244 > code path. > Proposed change: > * in-scope for this ticket: To add direct support for index sorting > configuration in Solr. > * out-of-scope for this ticket: deprecation and removal of SortingMergePolicy > support -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-17653) New DockerSolrServerTestRule that uses TestContainers/Docker
[ https://issues.apache.org/jira/browse/SOLR-17653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17924309#comment-17924309 ] Jan Høydahl commented on SOLR-17653: I have some experience with JUnit and testContainers. In general, works like magic, but there are a few tings to watch out for * You can start a container from a Dockerfile, but it of course takes ages to run the test due to the image building, pulling stuff from hub * When addressing the new container from a test, it may work with "localhost" when run locally, but once in CI/CD it may be something else * Solr code running in the container cannot use localhost to address another node in another container. Must either use docker networking (preferred) or "host.docker.internal". * Cleaning up (docker prune system or similar) as part of test harness may avoid buildup and disks filling up with old images or volumes. > New DockerSolrServerTestRule that uses TestContainers/Docker > > > Key: SOLR-17653 > URL: https://issues.apache.org/jira/browse/SOLR-17653 > Project: Solr > Issue Type: Test > Components: SolrJ, test-framework >Reporter: David Smiley >Priority: Major > > We've got a {{SolrClientTestRule}} abstraction in our test infrastructure > that makes it easy for a test to work with Solr in an abstracted sense, using > a SolrClient to talk to it. This issue proposes a new implementation that > uses an Http SolrClient (of configurable implementation) to talk to SolrCloud > (embedded ZK) running on a single node in a Docker container, and using > TestContainers to facilitate the integration. Future extensibility: the > implementation could consider multiple nodes thus multiple containers. The > location of this utility could be a new artifact (JAR) living in the docker > module with appropriate dependencies, or just put in solr-test-framework. > This would be extremely useful! We could then write a hello-world SolrJ > SolrCloud test in which Solr is running semi-realistically. This is all we > need for easily adding some backwards-compatibility tests, in either > direction (SolrJ 9 to Solr 10, and SolrJ 10 to Solr 9). Both directions are > useful in a rolling upgrade. Tests using this in our test suite should be > demarcated / separated somehow so that running such tests is opt-in, not part > of the "check" task. > The addition of this may draw into question the validity/utility of our > Docker bats tests, that could perhaps instead be using this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-17649) Multivalue facets on enum field type returns empty result when using JsonFacet API
[ https://issues.apache.org/jira/browse/SOLR-17649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17924310#comment-17924310 ] David Smiley commented on SOLR-17649: - Might you surmise how this regression came to be? > Multivalue facets on enum field type returns empty result when using > JsonFacet API > -- > > Key: SOLR-17649 > URL: https://issues.apache.org/jira/browse/SOLR-17649 > Project: Solr > Issue Type: Bug > Components: Facet Module, FacetComponent, faceting >Affects Versions: 9.4, 9.5, 9.4.1, 9.6, 9.7, 9.6.1, 9.8 >Reporter: Thomas Wöckinger >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > When using JsonFacet API on a multivalued EnumFieldType with facet method > 'enum' the {color:#232629}FacetFieldProcessorByArrayDV{color} will be used. > At line 96 ({color:#232629}FacetFieldProcessorByArrayDV.java){color} there is > a check about allBuckets or missing buckets, which simply skip collect > process. > So at the moment there is no support for Multivalue faceting on facets which > are using FacetFieldProcessorByArrayDV facet processor for collecting facets. > Tested this behavior from 9.4 onwards, this feature was working on the 8.x > releases. > Multivalue faceting on EnumFieldType should be supported again. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17649: Fix Json faceting on multivalue number types [solr]
dsmiley commented on code in PR #3158: URL: https://github.com/apache/solr/pull/3158#discussion_r1943824313 ## solr/core/src/test/org/apache/solr/search/facet/TestJsonFacets.java: ## @@ -4907,6 +4907,38 @@ public void testQueryJoinBooksAndPages() throws Exception { + ", books2:{ buckets:[ {val:q,count:1}, {val:w,count:1} ] }" + "}"); } + + @Test + public void testMultivalueEnumTypes() throws Exception { +final Client client = Client.localClient(); + +final SolrParams p = params("rows", "0"); + +client.deleteByQuery("*:*", null); + +List docsToAdd = new ArrayList<>(6); Review Comment: Can you add them to an UpdateRequest so they are batched instead of sending them individually? Same LOC in the end but avoids an anti-pattern. Or is the separation pertinent to what's being tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17654: DistribFileStore._getRealPath() has issues on Windows [solr]
janhoy commented on PR #3160: URL: https://github.com/apache/solr/pull/3160#issuecomment-2638285227 > _(thinking out loud)_ I'd love to get a JVM running in Windows in a container maybe (not possible with macOS host?) or I suppose VirtualBox if I have to. Actually, I could probably use AWS free tier and use an AMI with Java to do some temporary tinkering. Or maybe someone recommends another option. I use UTM (https://mac.getutm.app) on M1 mac to spin up Win11 ARM edition. You can share your disk and test stuff. PS: Wonder if we could tell GithubActions to run tests on windows runner for a PR? Perhaps triggered by some file touched? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17130: edismax-matchalldocs-optimization [solr]
github-actions[bot] closed pull request #2218: SOLR-17130: edismax-matchalldocs-optimization URL: https://github.com/apache/solr/pull/2218 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17130: edismax-matchalldocs-optimization [solr]
github-actions[bot] commented on PR #2218: URL: https://github.com/apache/solr/pull/2218#issuecomment-2638303460 This PR is now closed due to 60 days of inactivity after being marked as stale. Re-opening this PR is still possible, in which case it will be marked as active again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-16810: Under certain situations Solr produces managed schema XML with duplicate fields [solr]
github-actions[bot] closed pull request #1654: SOLR-16810: Under certain situations Solr produces managed schema XML with duplicate fields URL: https://github.com/apache/solr/pull/1654 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-16810: Under certain situations Solr produces managed schema XML with duplicate fields [solr]
github-actions[bot] commented on PR #1654: URL: https://github.com/apache/solr/pull/1654#issuecomment-2638303508 This PR is now closed due to 60 days of inactivity after being marked as stale. Re-opening this PR is still possible, in which case it will be marked as active again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17131: Optimize rows=0 since score/sort isn't necessary [solr]
github-actions[bot] commented on PR #2221: URL: https://github.com/apache/solr/pull/2221#issuecomment-2638303410 This PR is now closed due to 60 days of inactivity after being marked as stale. Re-opening this PR is still possible, in which case it will be marked as active again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17131: Optimize rows=0 since score/sort isn't necessary [solr]
github-actions[bot] closed pull request #2221: SOLR-17131: Optimize rows=0 since score/sort isn't necessary URL: https://github.com/apache/solr/pull/2221 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17649: Fix Json faceting on multivalue number types [solr]
thomaswoeckinger commented on PR #3158: URL: https://github.com/apache/solr/pull/3158#issuecomment-2639051999 > It's unfortunate we can't check the docValues format, but can we at least add a comment explaining this. It's a bit confusing just looking at it without context. FacetFieldProcessorByArrayDV does not handle DocValuesType.SORTED_NUMERIC because in method findStartAndEndOrds FieldUtil.getSortedSetDocValues is used, which can not handle this kind of DocValuesType. In detail FacetFieldProcessorByArrayDV extends FacetFieldProcessorByArray which seems to be designed for DocValuesType.SORTED_SET, because the method lookupOrd is only possible when having such a type. Saying that, you may have some suggestions for a code comment. For me it seems this was not working form the beginning of the 9x branch, there was simply not test for it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-17635) javabin should deserialize maps as SimpleOrderedMap
[ https://issues.apache.org/jira/browse/SOLR-17635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17924383#comment-17924383 ] Renato Haeberli commented on SOLR-17635: The default behavior should be to use SimpleOrderMap, and the property is to switch it back to NameList? > javabin should deserialize maps as SimpleOrderedMap > --- > > Key: SOLR-17635 > URL: https://issues.apache.org/jira/browse/SOLR-17635 > Project: Solr > Issue Type: Improvement >Reporter: David Smiley >Priority: Major > > Once SimpleOrderedMap actually implements Map (SOLR-17623), Solr's "javabin" > format should deserialize all maps as a SimpleOrderedMap. This will make it > easier to transition away from NamedList/SimpleOrderedMap in responses (such > as to a Map or MapWriter) without worry of impacting javabin clients that > still expect a NamedList. > It may also increase deserialization performance & lower memory at the > expense of any former Maps (thus were deserialized as LinkedHashMap, O(1) > lookup) becoming O(N) lookup. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[PR] SOLR-17654: DistribFileStore._getRealPath() has issues on Windows [solr]
mlbiscoc opened a new pull request, #3160: URL: https://github.com/apache/solr/pull/3160 https://issues.apache.org/jira/browse/SOLR-17654 # Description Number of tests failing on windows due to beginning slash in a path not being stripped to become relative. # Solution Revert to old logic but use more modern `FileSystems.getDefault().getSeparator()` instead. # Checklist Please review the following and check all that apply: - [ ] I have reviewed the guidelines for [How to Contribute](https://github.com/apache/solr/blob/main/CONTRIBUTING.md) and my code conforms to the standards described there to the best of my ability. - [ ] I have created a Jira issue and added the issue ID to my pull request title. - [ ] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation) - [ ] I have developed this patch against the `main` branch. - [ ] I have run `./gradlew check`. - [ ] I have added tests for my changes. - [ ] I have added documentation for the [Reference Guide](https://github.com/apache/solr/tree/main/solr/solr-ref-guide) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-16903: Migrate off java.io.File to java.nio.file.Path from core files [solr]
mlbiscoc commented on code in PR #2924: URL: https://github.com/apache/solr/pull/2924#discussion_r1943703861 ## solr/core/src/java/org/apache/solr/filestore/DistribFileStore.java: ## @@ -80,19 +80,21 @@ public DistribFileStore(CoreContainer coreContainer) { } @Override - public Path getRealpath(String path) { + public Path getRealPath(String path) { return _getRealPath(path, solrHome); } - private static Path _getRealPath(String path, Path solrHome) { -if (File.separatorChar == '\\') { - path = path.replace('/', File.separatorChar); -} -SolrPaths.assertNotUnc(Path.of(path)); -while (path.startsWith(File.separator)) { // Trim all leading slashes - path = path.substring(1); + private static Path _getRealPath(String dir, Path solrHome) { +Path path = Path.of(dir); +SolrPaths.assertNotUnc(path); + +if (path.isAbsolute()) { + // Strip the path of from being absolute to become relative to resolve with SolrHome + path = path.subpath(0, path.getNameCount()); } Review Comment: [PR](https://github.com/apache/solr/pull/3160) to bring back that old logic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Updated] (SOLR-17654) DistribFileStore._getRealPath() has issues on Windows
[ https://issues.apache.org/jira/browse/SOLR-17654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SOLR-17654: -- Labels: pull-request-available (was: ) > DistribFileStore._getRealPath() has issues on Windows > - > > Key: SOLR-17654 > URL: https://issues.apache.org/jira/browse/SOLR-17654 > Project: Solr > Issue Type: Improvement >Reporter: Houston Putman >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > On Windows, many tests that use the DistribFileStore, such as > {{TestPackages}}, {{TestDistribFileStore}} and {{PackageToolTest}} are > failing because of an issue in {{DistribFileStore._getRealPath()}}. > This method tries to remove the beginning slashes from the path, and then > tries to make a new path relative to the file store location. However, in the > tests, it's failing and showing stuff like "Illegal path > \mypkg\v.0.12\jar_a.jar". Clearly in the code, the first "\" should have been > removed, so this code is having an issue with Windows. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17351: Decompose filestore "get file" API [solr]
gerlowskija commented on PR #3047: URL: https://github.com/apache/solr/pull/3047#issuecomment-2637426498 Alright - I think I've addressed the feedback so far? If I've missed anything, let me know. I've brought it up to date with 'main' and will aim to merge in the next few days pending any objections? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]
alessandrobenedetti commented on code in PR #3151: URL: https://github.com/apache/solr/pull/3151#discussion_r1943491138 ## solr/modules/llm/src/java/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessor.java: ## @@ -0,0 +1,100 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.llm.texttovector.update.processor; + +import org.apache.solr.common.SolrException; +import org.apache.solr.common.SolrInputDocument; +import org.apache.solr.common.SolrInputField; +import org.apache.solr.llm.texttovector.model.SolrTextToVectorModel; +import org.apache.solr.llm.texttovector.store.rest.ManagedTextToVectorModelStore; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.update.AddUpdateCommand; +import org.apache.solr.update.processor.UpdateRequestProcessor; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.lang.invoke.MethodHandles; +import java.util.ArrayList; +import java.util.List; + + +class TextToVectorUpdateProcessor extends UpdateRequestProcessor { +private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); + +private final String inputField; +private final String outputField; +private final String model; +private SolrTextToVectorModel textToVector; +private ManagedTextToVectorModelStore modelStore = null; Review Comment: Sure! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Created] (SOLR-17654) DistribFileStore._getRealPath() has issues on Windows
Houston Putman created SOLR-17654: - Summary: DistribFileStore._getRealPath() has issues on Windows Key: SOLR-17654 URL: https://issues.apache.org/jira/browse/SOLR-17654 Project: Solr Issue Type: Improvement Reporter: Houston Putman On Windows, many tests that use the DistribFileStore, such as {{TestPackages}}, {{TestDistribFileStore}} and {{PackageToolTest}} are failing because of an issue in {{DistribFileStore._getRealPath()}}. This method tries to remove the beginning slashes from the path, and then tries to make a new path relative to the file store location. However, in the tests, it's failing and showing stuff like "Illegal path \mypkg\v.0.12\jar_a.jar". Clearly in the code, the first "\" should have been removed, so this code is having an issue with Windows. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]
alessandrobenedetti commented on code in PR #3151: URL: https://github.com/apache/solr/pull/3151#discussion_r1943577028 ## solr/modules/llm/src/java/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessorFactory.java: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.llm.texttovector.update.processor; + +import org.apache.solr.common.SolrException; +import org.apache.solr.common.params.SolrParams; +import org.apache.solr.common.util.NamedList; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.response.SolrQueryResponse; +import org.apache.solr.schema.DenseVectorField; +import org.apache.solr.schema.FieldType; +import org.apache.solr.schema.SchemaField; +import org.apache.solr.update.processor.UpdateRequestProcessor; +import org.apache.solr.update.processor.UpdateRequestProcessorFactory; + +/** + * This class implements an UpdateProcessorFactory for the Text To Vector Update Processor. + */ +public class TextToVectorUpdateProcessorFactory extends UpdateRequestProcessorFactory { +private static final String INPUT_FIELD_PARAM = "inputField"; +private static final String OUTPUT_FIELD_PARAM = "outputField"; +private static final String MODEL_NAME = "model"; + +String inputField; +String outputField; +String modelName; +SolrParams params; + + +@Override +public void init(final NamedList args) { +if (args != null) { Review Comment: my bad, I took inspiration from an old factory, I'll remove this useless check in the next commit! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17654: DistribFileStore._getRealPath() has issues on Windows [solr]
dsmiley commented on PR #3160: URL: https://github.com/apache/solr/pull/3160#issuecomment-2638256078 _(thinking out loud)_ I'd love to get a JVM running in Windows in a container maybe (not possible with macOS host?) or I suppose VirtualBox if I have to. Actually, I could probably use AWS free tier and use an AMI with Java to do some temporary tinkering. Or maybe someone recommends another option. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] Deprecations [solr]
epugh commented on PR #3159: URL: https://github.com/apache/solr/pull/3159#issuecomment-2636956809 I ❤️ deprecations. How can we have a strategy to make sure these removals happen? I.e, how can we help folks see these areas of work and then move it forward? ARe you thinking that once this is back ported to 9, we can just go ahead in `main` and start removing the deprecated code? In my head, any deprecated code existing in 10 is a shame. We should rip those band aids off, but maybe I need a more nuanced view of what deprecation means. Every time I see in a test a deprecated method (especially when it's not clear exactly how to fix it) it makes me sad ;-). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]
alessandrobenedetti commented on code in PR #3151: URL: https://github.com/apache/solr/pull/3151#discussion_r1943628396 ## solr/modules/llm/src/java/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessor.java: ## @@ -0,0 +1,100 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.llm.texttovector.update.processor; + +import org.apache.solr.common.SolrException; +import org.apache.solr.common.SolrInputDocument; +import org.apache.solr.common.SolrInputField; +import org.apache.solr.llm.texttovector.model.SolrTextToVectorModel; +import org.apache.solr.llm.texttovector.store.rest.ManagedTextToVectorModelStore; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.update.AddUpdateCommand; +import org.apache.solr.update.processor.UpdateRequestProcessor; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.lang.invoke.MethodHandles; +import java.util.ArrayList; +import java.util.List; + + +class TextToVectorUpdateProcessor extends UpdateRequestProcessor { +private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); + +private final String inputField; +private final String outputField; +private final String model; +private SolrTextToVectorModel textToVector; +private ManagedTextToVectorModelStore modelStore = null; + +public TextToVectorUpdateProcessor( +String inputField, +String outputField, +String model, +SolrQueryRequest req, +UpdateRequestProcessor next) { +super(next); +this.inputField = inputField; +this.outputField = outputField; +this.model = model; +this.modelStore = ManagedTextToVectorModelStore.getManagedModelStore(req.getCore()); +} + +/** + * @param cmd the update command in input containing the Document to process + * @throws IOException If there is a low-level I/O error + */ +@Override +public void processAdd(AddUpdateCommand cmd) throws IOException { +this.textToVector = modelStore.getModel(model); +if (textToVector == null) { +throw new SolrException( +SolrException.ErrorCode.BAD_REQUEST, +"The model requested '" ++ model ++ "' can't be found in the store: " ++ ManagedTextToVectorModelStore.REST_END_POINT); +} + +SolrInputDocument doc = cmd.getSolrInputDocument(); +SolrInputField inputFieldContent = doc.get(inputField); +if (!isNullOrEmpty(inputFieldContent, doc, inputField)) { +String textToVectorise = inputFieldContent.getValue().toString();//add null checks and +float[] vector = textToVector.vectorise(textToVectorise); +List vectorAsList = new ArrayList(vector.length); +for (float f : vector) { +vectorAsList.add(f); +} +doc.addField(outputField, vectorAsList); +} +super.processAdd(cmd); +} + +protected boolean isNullOrEmpty(SolrInputField inputFieldContent, SolrInputDocument doc, String fieldName) { Review Comment: mmm I see your point, better if we just log a warning say "vectorisation failed", with the reason "null or empty source field" ? I suspect that silent failure would be equally problematic to understand why there are no vectors? (I also just discovered that the 'vectorise' method could throw runtime exception) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] Deprecations [solr]
dsmiley commented on PR #3159: URL: https://github.com/apache/solr/pull/3159#issuecomment-2637969249 > we can just go ahead in main and start removing the deprecated code? Sure. Some have JIRAs, even. There's plenty of other deprecations outside this PR. > any deprecated code existing in 10 is a shame. I've learned to be realistic in a large code base. The positive spin I'm at peace with is that spending just a little bit of deprecation time in advance gives us permission later to have the fun of deleting stuff when it suits us. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]
alessandrobenedetti commented on code in PR #3151: URL: https://github.com/apache/solr/pull/3151#discussion_r1943633074 ## solr/modules/llm/src/test/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessorFactoryTest.java: ## @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.llm.texttovector.update.processor; + +import org.apache.solr.common.SolrException; +import org.apache.solr.common.params.MultiMapSolrParams; +import org.apache.solr.common.params.SolrParams; +import org.apache.solr.common.util.NamedList; +import org.apache.solr.llm.TestLlmBase; +import org.apache.solr.request.SolrQueryRequestBase; +import org.junit.AfterClass; +import org.junit.BeforeClass; +import org.junit.Test; + +import java.util.HashMap; +import java.util.Map; + + +public class TextToVectorUpdateProcessorFactoryTest extends TestLlmBase { + private TextToVectorUpdateProcessorFactory factoryToTest = + new TextToVectorUpdateProcessorFactory(); + private NamedList args = new NamedList<>(); + + @BeforeClass + public static void initArgs() throws Exception { +setupTest("solrconfig-llm.xml", "schema.xml", false, false); + } + + @AfterClass + public static void after() throws Exception { +afterTest(); + } + + @Test + public void init_fullArgs_shouldInitFullClassificationParams() { +args.add("inputField", "_text_"); +args.add("outputField", "vector"); +args.add("model", "model1"); +factoryToTest.init(args); + +assertEquals("_text_", factoryToTest.getInputField()); +assertEquals("vector", factoryToTest.getOutputField()); +assertEquals("model1", factoryToTest.getModelName()); + } + + @Test + public void init_nullInputField_shouldThrowExceptionWithDetailedMessage() { +args.add("outputField", "vector"); +args.add("model", "model1"); + +SolrException e = assertThrows(SolrException.class, () -> factoryToTest.init(args)); +assertEquals("Text to Vector UpdateProcessor 'inputField' can not be null", e.getMessage()); + } + + @Test + public void init_notExistentInputField_shouldThrowExceptionWithDetailedMessage() throws Exception { +args.add("inputField", "notExistentInput"); +args.add("outputField", "vector"); +args.add("model", "model1"); + +Map params = new HashMap<>(); +MultiMapSolrParams mmparams = new MultiMapSolrParams(params); +SolrQueryRequestBase req = new SolrQueryRequestBase(solrClientTestRule.getCoreContainer().getCore("collection1"), (SolrParams) mmparams) {}; Review Comment: I admit I don't know, I'm not that java savvy, I suspect it has to do with instatiating a subclass of an abstract class? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-13681) make Lucene's index sorting directly configurable in Solr
[ https://issues.apache.org/jira/browse/SOLR-13681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17924252#comment-17924252 ] David Smiley commented on SOLR-13681: - Yes please do that. Major releases are indeed a time to simplify something, removing old things, even if it breaks someone. If we're not sure if existing users of some advanced feature like this will be compatible, I still think a major release is okay. Just call out the risk/unknown in the upgrade notes asciidoc file. > make Lucene's index sorting directly configurable in Solr > - > > Key: SOLR-13681 > URL: https://issues.apache.org/jira/browse/SOLR-13681 > Project: Solr > Issue Type: New Feature >Reporter: Christine Poerschke >Priority: Minor > Attachments: SOLR-13681-refguide-skel.patch, SOLR-13681.patch > > Time Spent: 3h 10m > Remaining Estimate: 0h > > History/Background: > * SOLR-5730 made Lucene's SortingMergePolicy and > EarlyTerminatingSortingCollector configurable in Solr 6.0 or later. > * LUCENE-6766 make index sorting a first-class citizen in Lucene 6.2 or later. > Current status: > * In Solr 8.2 use of index sorting is only available via configuration of a > (top-level) merge policy that is a SortingMergePolicy and that policy's sort > is then passed to the index writer config via the > {code} > if (mergePolicy instanceof SortingMergePolicy) { > Sort indexSort = ((SortingMergePolicy) mergePolicy).getSort(); > iwc.setIndexSort(indexSort); > } > {code} > https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.2.0/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L241-L244 > code path. > Proposed change: > * in-scope for this ticket: To add direct support for index sorting > configuration in Solr. > * out-of-scope for this ticket: deprecation and removal of SortingMergePolicy > support -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]
alessandrobenedetti commented on code in PR #3151: URL: https://github.com/apache/solr/pull/3151#discussion_r1943636345 ## solr/modules/llm/src/test/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessorTest.java: ## @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.llm.texttovector.update.processor; + +import org.apache.solr.client.solrj.SolrQuery; +import org.apache.solr.llm.TestLlmBase; +import org.apache.solr.llm.texttovector.store.rest.ManagedTextToVectorModelStore; +import org.junit.BeforeClass; +import org.junit.Test; + + +public class TextToVectorUpdateProcessorTest extends TestLlmBase { + +@BeforeClass +public static void init() throws Exception { +setupTest("solrconfig-llm-indexing.xml", "schema.xml", false, false); + +} + +@Test +public void processAdd_inputField_shouldVectoriseInputField() +throws Exception { +loadModel("dummy-model.json"); +assertU(adoc("id", "99", "_text_", "Vegeta is the saiyan prince.")); +assertU(adoc("id", "98", "_text_", "Vegeta is the saiyan prince.")); +assertU(commit()); + +final String solrQuery = "*:*"; +final SolrQuery query = new SolrQuery(); +query.setQuery(solrQuery); +query.add("fl", "id,vector"); + +assertJQ( +"/query" + query.toQueryString(), +"/response/numFound==2]", +"/response/docs/[0]/id=='99'", +"/response/docs/[0]/vector==[1.0, 2.0, 3.0, 4.0]", +"/response/docs/[1]/id=='98'", +"/response/docs/[1]/vector==[1.0, 2.0, 3.0, 4.0]"); + +restTestHarness.delete(ManagedTextToVectorModelStore.REST_END_POINT + "/dummy-1"); Review Comment: it was cleanup, but not all tests need it, so I added it explicitly, added a line comment to make it clearer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]
alessandrobenedetti commented on code in PR #3151: URL: https://github.com/apache/solr/pull/3151#discussion_r1943637205 ## solr/modules/llm/src/test/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessorTest.java: ## @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.llm.texttovector.update.processor; + +import org.apache.solr.client.solrj.SolrQuery; +import org.apache.solr.llm.TestLlmBase; +import org.apache.solr.llm.texttovector.store.rest.ManagedTextToVectorModelStore; +import org.junit.BeforeClass; +import org.junit.Test; + + +public class TextToVectorUpdateProcessorTest extends TestLlmBase { + +@BeforeClass +public static void init() throws Exception { +setupTest("solrconfig-llm-indexing.xml", "schema.xml", false, false); + +} + +@Test +public void processAdd_inputField_shouldVectoriseInputField() +throws Exception { +loadModel("dummy-model.json"); +assertU(adoc("id", "99", "_text_", "Vegeta is the saiyan prince.")); +assertU(adoc("id", "98", "_text_", "Vegeta is the saiyan prince.")); +assertU(commit()); + +final String solrQuery = "*:*"; +final SolrQuery query = new SolrQuery(); +query.setQuery(solrQuery); +query.add("fl", "id,vector"); + +assertJQ( +"/query" + query.toQueryString(), +"/response/numFound==2]", +"/response/docs/[0]/id=='99'", +"/response/docs/[0]/vector==[1.0, 2.0, 3.0, 4.0]", +"/response/docs/[1]/id=='98'", +"/response/docs/[1]/vector==[1.0, 2.0, 3.0, 4.0]"); + +restTestHarness.delete(ManagedTextToVectorModelStore.REST_END_POINT + "/dummy-1"); +} + +/* +This test looks for the 'dummy-1' model, but such model is not loaded, the model store is empty, so the update fails + */ +@Test +public void processAdd_modelNotFound_shouldRaiseException() { +assertFailedU("This update should fail but actually succeeded", adoc("id", "99", "_text_", "Vegeta is the saiyan prince.")); + +checkUpdateU(adoc("id", "99", "_text_", "Vegeta is the saiyan prince."), +"/response/lst[@name='error']/str[@name='msg']=\"The model requested 'dummy-1' can't be found in the store: /schema/text-to-vector-model-store\"", +"/response/lst[@name='error']/int[@name='code']='400'"); +} + +@Test +public void processAdd_emptyInputField_shouldLogAndIndexWithNoVector() throws Exception { +loadModel("dummy-model.json"); +assertU(adoc("id", "99", "_text_", "")); +assertU(adoc("id", "98", "_text_", "Vegeta is the saiyan prince.")); +assertU(commit()); + +final String solrQuery = "*:*"; +final SolrQuery query = new SolrQuery(); +query.setQuery(solrQuery); +query.add("fl", "id,vector"); + +assertJQ( +"/query" + query.toQueryString(), +"/response/numFound==2]", +"/response/docs/[0]/id=='99'", +"!/response/docs/[0]/vector==", //no vector field for the document 99 Review Comment: it took an afternoon almost to find that, it deserved a comment :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Created] (SOLR-17655) Deprecate ExternalFileField in lieu of in-place docValue updates
David Smiley created SOLR-17655: --- Summary: Deprecate ExternalFileField in lieu of in-place docValue updates Key: SOLR-17655 URL: https://issues.apache.org/jira/browse/SOLR-17655 Project: Solr Issue Type: Task Reporter: David Smiley ExternalFileField is an old capability of Solr that pre-dated in-place partial updates of numeric DocValue fields. There are some issues with it, and it has code to be maintained. It's time to deprecate it in 9.9 so it can be removed in 10. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Updated] (SOLR-17655) Deprecate ExternalFileField in lieu of in-place docValue updates
[ https://issues.apache.org/jira/browse/SOLR-17655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated SOLR-17655: Description: ExternalFileField is an old capability of Solr that pre-dated in-place partial updates of numeric DocValue fields. There are [some issues with it|https://issues.apache.org/jira/issues/?jql=project%20%3D%20SOLR%20AND%20text%20~%20%22FileFloatSource%22%20AND%20resolution%20is%20null%20ORDER%20BY%20created%20DESC], and it has code to be maintained. It's time to deprecate it in 9.9 so it can be removed in 10. (was: ExternalFileField is an old capability of Solr that pre-dated in-place partial updates of numeric DocValue fields. There are some issues with it, and it has code to be maintained. It's time to deprecate it in 9.9 so it can be removed in 10.) > Deprecate ExternalFileField in lieu of in-place docValue updates > > > Key: SOLR-17655 > URL: https://issues.apache.org/jira/browse/SOLR-17655 > Project: Solr > Issue Type: Task >Reporter: David Smiley >Priority: Major > > ExternalFileField is an old capability of Solr that pre-dated in-place > partial updates of numeric DocValue fields. There are [some issues with > it|https://issues.apache.org/jira/issues/?jql=project%20%3D%20SOLR%20AND%20text%20~%20%22FileFloatSource%22%20AND%20resolution%20is%20null%20ORDER%20BY%20created%20DESC], > and it has code to be maintained. It's time to deprecate it in 9.9 so it > can be removed in 10. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-17655) Deprecate ExternalFileField in lieu of in-place docValue updates
[ https://issues.apache.org/jira/browse/SOLR-17655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17924257#comment-17924257 ] David Smiley commented on SOLR-17655: - Some users [have issues|https://lists.apache.org/thread/xwr8j0ydmxjrq7q74020728jg1bqv2k3] with ExternalFileField. > Deprecate ExternalFileField in lieu of in-place docValue updates > > > Key: SOLR-17655 > URL: https://issues.apache.org/jira/browse/SOLR-17655 > Project: Solr > Issue Type: Task >Reporter: David Smiley >Priority: Major > > ExternalFileField is an old capability of Solr that pre-dated in-place > partial updates of numeric DocValue fields. There are [some issues with > it|https://issues.apache.org/jira/issues/?jql=project%20%3D%20SOLR%20AND%20text%20~%20%22FileFloatSource%22%20AND%20resolution%20is%20null%20ORDER%20BY%20created%20DESC], > and it has code to be maintained. It's time to deprecate it in 9.9 so it > can be removed in 10. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Updated] (SOLR-17655) Deprecate ExternalFileField in lieu of in-place docValue updates
[ https://issues.apache.org/jira/browse/SOLR-17655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated SOLR-17655: Priority: Blocker (was: Major) > Deprecate ExternalFileField in lieu of in-place docValue updates > > > Key: SOLR-17655 > URL: https://issues.apache.org/jira/browse/SOLR-17655 > Project: Solr > Issue Type: Task >Reporter: David Smiley >Priority: Blocker > Fix For: 9.9 > > > ExternalFileField is an old capability of Solr that pre-dated in-place > partial updates of numeric DocValue fields. There are [some issues with > it|https://issues.apache.org/jira/issues/?jql=project%20%3D%20SOLR%20AND%20text%20~%20%22FileFloatSource%22%20AND%20resolution%20is%20null%20ORDER%20BY%20created%20DESC], > and it has code to be maintained. It's time to deprecate it in 9.9 so it > can be removed in 10. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-17655) Deprecate ExternalFileField in lieu of in-place docValue updates
[ https://issues.apache.org/jira/browse/SOLR-17655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17924259#comment-17924259 ] David Smiley commented on SOLR-17655: - BTW there are a variety of pieces of code here that can be removed. I'm thinking VersionedFile and FileFloatSource, in addition to obviously ExternalFileField. > Deprecate ExternalFileField in lieu of in-place docValue updates > > > Key: SOLR-17655 > URL: https://issues.apache.org/jira/browse/SOLR-17655 > Project: Solr > Issue Type: Task >Reporter: David Smiley >Priority: Major > > ExternalFileField is an old capability of Solr that pre-dated in-place > partial updates of numeric DocValue fields. There are [some issues with > it|https://issues.apache.org/jira/issues/?jql=project%20%3D%20SOLR%20AND%20text%20~%20%22FileFloatSource%22%20AND%20resolution%20is%20null%20ORDER%20BY%20created%20DESC], > and it has code to be maintained. It's time to deprecate it in 9.9 so it > can be removed in 10. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Updated] (SOLR-17655) Deprecate ExternalFileField in lieu of in-place docValue updates
[ https://issues.apache.org/jira/browse/SOLR-17655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated SOLR-17655: Fix Version/s: 9.9 > Deprecate ExternalFileField in lieu of in-place docValue updates > > > Key: SOLR-17655 > URL: https://issues.apache.org/jira/browse/SOLR-17655 > Project: Solr > Issue Type: Task >Reporter: David Smiley >Priority: Major > Fix For: 9.9 > > > ExternalFileField is an old capability of Solr that pre-dated in-place > partial updates of numeric DocValue fields. There are [some issues with > it|https://issues.apache.org/jira/issues/?jql=project%20%3D%20SOLR%20AND%20text%20~%20%22FileFloatSource%22%20AND%20resolution%20is%20null%20ORDER%20BY%20created%20DESC], > and it has code to be maintained. It's time to deprecate it in 9.9 so it > can be removed in 10. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-16903: Migrate off java.io.File to java.nio.file.Path from core files [solr]
mlbiscoc commented on code in PR #2924: URL: https://github.com/apache/solr/pull/2924#discussion_r1943652914 ## solr/core/src/java/org/apache/solr/filestore/DistribFileStore.java: ## @@ -80,19 +80,21 @@ public DistribFileStore(CoreContainer coreContainer) { } @Override - public Path getRealpath(String path) { + public Path getRealPath(String path) { return _getRealPath(path, solrHome); } - private static Path _getRealPath(String path, Path solrHome) { -if (File.separatorChar == '\\') { - path = path.replace('/', File.separatorChar); -} -SolrPaths.assertNotUnc(Path.of(path)); -while (path.startsWith(File.separator)) { // Trim all leading slashes - path = path.substring(1); + private static Path _getRealPath(String dir, Path solrHome) { +Path path = Path.of(dir); +SolrPaths.assertNotUnc(path); + +if (path.isAbsolute()) { + // Strip the path of from being absolute to become relative to resolve with SolrHome + path = path.subpath(0, path.getNameCount()); } Review Comment: Thanks for catching Houston. I can make a PR to revert that as the safest thing but I don't have a windows machine on hand to easily test this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]
alessandrobenedetti commented on code in PR #3151: URL: https://github.com/apache/solr/pull/3151#discussion_r1943595213 ## solr/modules/llm/src/java/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessorFactory.java: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.llm.texttovector.update.processor; + +import org.apache.solr.common.SolrException; +import org.apache.solr.common.params.SolrParams; +import org.apache.solr.common.util.NamedList; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.response.SolrQueryResponse; +import org.apache.solr.schema.DenseVectorField; +import org.apache.solr.schema.FieldType; +import org.apache.solr.schema.SchemaField; +import org.apache.solr.update.processor.UpdateRequestProcessor; +import org.apache.solr.update.processor.UpdateRequestProcessorFactory; + +/** + * This class implements an UpdateProcessorFactory for the Text To Vector Update Processor. + */ +public class TextToVectorUpdateProcessorFactory extends UpdateRequestProcessorFactory { +private static final String INPUT_FIELD_PARAM = "inputField"; +private static final String OUTPUT_FIELD_PARAM = "outputField"; +private static final String MODEL_NAME = "model"; + +String inputField; +String outputField; +String modelName; +SolrParams params; + + +@Override +public void init(final NamedList args) { +if (args != null) { +params = args.toSolrParams(); +inputField = params.get(INPUT_FIELD_PARAM); +checkNotNull(INPUT_FIELD_PARAM, inputField); + +outputField = params.get(OUTPUT_FIELD_PARAM); +checkNotNull(OUTPUT_FIELD_PARAM, outputField); + +modelName = params.get(MODEL_NAME); +checkNotNull(MODEL_NAME, modelName); Review Comment: much cleaner, thanks David! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-16903: Migrate off java.io.File to java.nio.file.Path from core files [solr]
HoustonPutman commented on code in PR #2924: URL: https://github.com/apache/solr/pull/2924#discussion_r1943584596 ## solr/core/src/java/org/apache/solr/filestore/DistribFileStore.java: ## @@ -80,19 +80,21 @@ public DistribFileStore(CoreContainer coreContainer) { } @Override - public Path getRealpath(String path) { + public Path getRealPath(String path) { return _getRealPath(path, solrHome); } - private static Path _getRealPath(String path, Path solrHome) { -if (File.separatorChar == '\\') { - path = path.replace('/', File.separatorChar); -} -SolrPaths.assertNotUnc(Path.of(path)); -while (path.startsWith(File.separator)) { // Trim all leading slashes - path = path.substring(1); + private static Path _getRealPath(String dir, Path solrHome) { +Path path = Path.of(dir); +SolrPaths.assertNotUnc(path); + +if (path.isAbsolute()) { + // Strip the path of from being absolute to become relative to resolve with SolrHome + path = path.subpath(0, path.getNameCount()); } Review Comment: This is broken on Windows. A number of tests fail because it's not stripping leading "\" characters. Not sure if we should revert to the previous logic, or do it another way. I made https://issues.apache.org/jira/browse/SOLR-17654, but it seems this is a new thing, so we can probably close that. @mlbiscoc -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Resolved] (SOLR-17497) Pull replicas throws AlreadyClosedException
[ https://issues.apache.org/jira/browse/SOLR-17497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski resolved SOLR-17497. Fix Version/s: main (10.0) 9.8 Assignee: Sanjay Dutt Resolution: Fixed > Pull replicas throws AlreadyClosedException > - > > Key: SOLR-17497 > URL: https://issues.apache.org/jira/browse/SOLR-17497 > Project: Solr > Issue Type: Task >Reporter: Sanjay Dutt >Assignee: Sanjay Dutt >Priority: Major > Fix For: main (10.0), 9.8 > > Attachments: Screenshot 2024-10-23 at 6.01.02 PM.png > > > Recently, a common exception (org.apache.lucene.store.AlreadyClosedException: > this Directory is closed) seen in multiple failed test cases. > FAILED: org.apache.solr.cloud.TestPullReplica.testKillPullReplica > FAILED: > org.apache.solr.cloud.SplitShardWithNodeRoleTest.testSolrClusterWithNodeRoleWithPull > FAILED: org.apache.solr.cloud.TestPullReplica.testAddDocs > > > {code:java} > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an > uncaught exception in thread: Thread[id=10271, > name=fsyncService-6341-thread-1, state=RUNNABLE, > group=TGRP-SplitShardWithNodeRoleTest] > at > __randomizedtesting.SeedInfo.seed([3F7DACB3BC44C3C4:E5DB3E97188A8EB9]:0) > Caused by: org.apache.lucene.store.AlreadyClosedException: this Directory is > closed > at __randomizedtesting.SeedInfo.seed([3F7DACB3BC44C3C4]:0) > at > app//org.apache.lucene.store.BaseDirectory.ensureOpen(BaseDirectory.java:50) > at > app//org.apache.lucene.store.ByteBuffersDirectory.sync(ByteBuffersDirectory.java:237) > at > app//org.apache.lucene.tests.store.MockDirectoryWrapper.sync(MockDirectoryWrapper.java:214) > at > app//org.apache.solr.handler.IndexFetcher$DirectoryFile.sync(IndexFetcher.java:2034) > at > app//org.apache.solr.handler.IndexFetcher$FileFetcher.lambda$fetch$0(IndexFetcher.java:1803) > at > app//org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$1(ExecutorUtil.java:449) > at > java.base@11.0.24/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base@11.0.24/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base@11.0.24/java.lang.Thread.run(Thread.java:829) > {code} > > Interesting thing about these test cases is that they all share same kind of > setup where each has one shard and two replicas – one NRT and another is PULL. > > Going through one of the test case execution step. > FAILED: org.apache.solr.cloud.TestPullReplica.testKillPullReplica > > Test flow > 1. Create a collection with 1 NRT and 1 PULL replica > 2. waitForState > 3. waitForNumDocsInAllActiveReplicas(0); // *Name says it all* > 4. Index another document. > 5. waitForNumDocsInAllActiveReplicas(1); > 6. Stop Pull replica > 7. Index another document > 8. waitForNumDocsInAllActiveReplicas(2); > 9. Start Pull Replica > 10. waitForState > 11. waitForNumDocsInAllActiveReplicas(2); > > As per the logs the whole sequence executed successfully. Here is the link to > the logs: > [https://ge.apache.org/s/yxydiox3gvlf2/tests/task/:solr:core:test/details/org.apache.solr.cloud.TestPullReplica/testKillPullReplica/1/output] > (link may stop working in the future) > > Last step where they are making sure that all the active replicas should have > two documents each has logged a info which is another proof that it completed > successfully. > > {code:java} > 616575 INFO > (TEST-TestPullReplica.testKillPullReplica-seed#[F30CC837FDD0DC28]) [n: c: s: > r: x: t:] o.a.s.c.TestPullReplica Replica core_node3 > (https://127.0.0.1:35647/solr/pull_replica_test_kill_pull_replica_shard1_replica_n1/) > has all 2 docs 616606 INFO (qtp1091538342-13057-null-11348) > [n:127.0.0.1:38207_solr c:pull_replica_test_kill_pull_replica s:shard1 > r:core_node4 x:pull_replica_test_kill_pull_replica_shard1_replica_p2 > t:null-11348] o.a.s.c.S.Request webapp=/solr path=/select > params={q=*:*&wt=javabin&version=2} rid=null-11348 hits=2 status=0 QTime=0 > 616607 INFO > (TEST-TestPullReplica.testKillPullReplica-seed#[F30CC837FDD0DC28]) [n: c: s: > r: x: t:] o.a.s.c.TestPullReplica Replica core_node4 > (https://127.0.0.1:38207/solr/pull_replica_test_kill_pull_replica_shard1_replica_p2/) > has all 2 docs{code} > > *Where is the issue then?* > In the logs it has been observed, that after restarting the PULL replica. The > recovery process started and after fetching all the files info from the NRT, > the replication aborted and logged "User aborted replication" > > {code:java} > o.a.s.h.IndexFetcher User aborted Replication => > org.apache.solr.handler.IndexFetcher$Replica
Re: [PR] SOLR-17641: Disable the Security Manager for Java 24+ [solr]
HoustonPutman commented on PR #3153: URL: https://github.com/apache/solr/pull/3153#issuecomment-2637917612 > Agree with @epugh - isn't Security Manager all no-ops even with 21 (which IIRC is the minimum Java for Solr 10) I've never heard that. Do you have a link for that? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17641: Disable the Security Manager for Java 24+ [solr]
uschindler commented on PR #3153: URL: https://github.com/apache/solr/pull/3153#issuecomment-2637926350 > > Agree with @epugh - isn't Security Manager all no-ops even with 21 (which IIRC is the minimum Java for Solr 10) > > I've never heard that. Do you have a link for that? No, it isn't. It's fully working in 21. It gets a noop in 24. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]
alessandrobenedetti commented on code in PR #3151: URL: https://github.com/apache/solr/pull/3151#discussion_r1943608528 ## solr/modules/llm/src/java/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessorFactory.java: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.llm.texttovector.update.processor; + +import org.apache.solr.common.SolrException; +import org.apache.solr.common.params.SolrParams; +import org.apache.solr.common.util.NamedList; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.response.SolrQueryResponse; +import org.apache.solr.schema.DenseVectorField; +import org.apache.solr.schema.FieldType; +import org.apache.solr.schema.SchemaField; +import org.apache.solr.update.processor.UpdateRequestProcessor; +import org.apache.solr.update.processor.UpdateRequestProcessorFactory; + +/** + * This class implements an UpdateProcessorFactory for the Text To Vector Update Processor. + */ +public class TextToVectorUpdateProcessorFactory extends UpdateRequestProcessorFactory { +private static final String INPUT_FIELD_PARAM = "inputField"; +private static final String OUTPUT_FIELD_PARAM = "outputField"; +private static final String MODEL_NAME = "model"; + +String inputField; +String outputField; +String modelName; +SolrParams params; + + +@Override +public void init(final NamedList args) { +if (args != null) { +params = args.toSolrParams(); +inputField = params.get(INPUT_FIELD_PARAM); +checkNotNull(INPUT_FIELD_PARAM, inputField); + +outputField = params.get(OUTPUT_FIELD_PARAM); +checkNotNull(OUTPUT_FIELD_PARAM, outputField); + +modelName = params.get(MODEL_NAME); +checkNotNull(MODEL_NAME, modelName); +} +} + +private void checkNotNull(String paramName, Object param) { +if (param == null) { +throw new SolrException( +SolrException.ErrorCode.SERVER_ERROR, +"Text to Vector UpdateProcessor '" + paramName + "' can not be null"); +} +} + +@Override +public UpdateRequestProcessor getInstance(SolrQueryRequest req, SolrQueryResponse rsp, UpdateRequestProcessor next) { +req.getCore().getLatestSchema().getField(inputField); Review Comment: it checks that 'inputField' is defined in the schema. With the latest commit I changed it to make it more explicit but I am open to suggestions -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]
alessandrobenedetti commented on code in PR #3151: URL: https://github.com/apache/solr/pull/3151#discussion_r1943614530 ## solr/modules/llm/src/test-files/solr/collection1/conf/solrconfig-llm-indexing-notDenseVectorField.xml: ## Review Comment: I could, but the reason I added it is that I struggled to find testing methods such as org.apache.solr.util.RestTestBase#assertU(java.lang.String) that takes the chain as a parameter. So I added as the default and I could test it. I would not want to be the default when indexing docs for the query time test. If you have any suggestion I'm open to changes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]
alessandrobenedetti commented on code in PR #3151: URL: https://github.com/apache/solr/pull/3151#discussion_r1943616272 ## solr/modules/llm/src/java/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessor.java: ## @@ -0,0 +1,100 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.llm.texttovector.update.processor; Review Comment: I agree, texttovector is horribly unreadable, maybe 'textvectorisation' ? adding it in the coming commit -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]
dsmiley commented on code in PR #3151: URL: https://github.com/apache/solr/pull/3151#discussion_r1943529534 ## solr/modules/llm/src/test/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessorFactoryTest.java: ## @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.llm.texttovector.update.processor; + +import org.apache.solr.common.SolrException; +import org.apache.solr.common.params.MultiMapSolrParams; +import org.apache.solr.common.params.SolrParams; +import org.apache.solr.common.util.NamedList; +import org.apache.solr.llm.TestLlmBase; +import org.apache.solr.request.SolrQueryRequestBase; +import org.junit.AfterClass; +import org.junit.BeforeClass; +import org.junit.Test; + +import java.util.HashMap; +import java.util.Map; + + +public class TextToVectorUpdateProcessorFactoryTest extends TestLlmBase { + private TextToVectorUpdateProcessorFactory factoryToTest = + new TextToVectorUpdateProcessorFactory(); + private NamedList args = new NamedList<>(); + + @BeforeClass + public static void initArgs() throws Exception { +setupTest("solrconfig-llm.xml", "schema.xml", false, false); + } + + @AfterClass + public static void after() throws Exception { +afterTest(); + } + + @Test + public void init_fullArgs_shouldInitFullClassificationParams() { +args.add("inputField", "_text_"); +args.add("outputField", "vector"); +args.add("model", "model1"); +factoryToTest.init(args); + +assertEquals("_text_", factoryToTest.getInputField()); +assertEquals("vector", factoryToTest.getOutputField()); +assertEquals("model1", factoryToTest.getModelName()); + } + + @Test + public void init_nullInputField_shouldThrowExceptionWithDetailedMessage() { +args.add("outputField", "vector"); +args.add("model", "model1"); + +SolrException e = assertThrows(SolrException.class, () -> factoryToTest.init(args)); +assertEquals("Text to Vector UpdateProcessor 'inputField' can not be null", e.getMessage()); + } + + @Test + public void init_notExistentInputField_shouldThrowExceptionWithDetailedMessage() throws Exception { +args.add("inputField", "notExistentInput"); +args.add("outputField", "vector"); +args.add("model", "model1"); + +Map params = new HashMap<>(); +MultiMapSolrParams mmparams = new MultiMapSolrParams(params); +SolrQueryRequestBase req = new SolrQueryRequestBase(solrClientTestRule.getCoreContainer().getCore("collection1"), (SolrParams) mmparams) {}; Review Comment: It's an anonymous inner class. What's probably throwing you off is that there are no method overrides, which 99% of the time is the point of doing an anonymous inner class. Here it's because SQRB is abstract so he's forced to subclass it in order to use it. I've been thinking of this case recently and I think we should simply make that impl not abstract. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]
dsmiley commented on code in PR #3151: URL: https://github.com/apache/solr/pull/3151#discussion_r1943539435 ## solr/test-framework/src/java/org/apache/solr/util/RestTestBase.java: ## @@ -88,13 +88,33 @@ private static void checkUpdateU(String message, String update, boolean shouldSu if (response != null) fail(m + "update was not successful: " + response); } else { String response = restTestHarness.validateErrorUpdate(update); -if (response != null) fail(m + "update succeeded, but should have failed: " + response); +if (response == null) fail(m + "update succeeded, but should have failed: " + response); } } catch (SAXException e) { throw new RuntimeException("Invalid XML", e); } } + public static void checkUpdateU(String update, String... tests) { Review Comment: At the moment, RestTestBase is common to basically any test using a "REST-based model store"; which the LLM stuff recently added a new variant of and hence RestTestBase is used. RestTestBase is used a lot. Preferrably we wouldn't depend too much on our class hierarchy to accomplish re-usable things. But there's no realistic action to take right now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-16391: Convert create-core, core-status, /luke to JAX-RS [solr]
gerlowskija commented on code in PR #3054: URL: https://github.com/apache/solr/pull/3054#discussion_r1943369180 ## solr/core/src/java/org/apache/solr/handler/admin/CoreAdminOperation.java: ## @@ -92,32 +82,12 @@ public enum CoreAdminOperation implements CoreAdminOp { CREATE_OP( CREATE, it -> { -assert TestInjection.injectRandomDelayInCoreCreation(); Review Comment: Eric found this later on his own, but for anyone else, this line was moved but not removed. See [here](https://github.com/apache/solr/pull/3054#discussion_r1925326281) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]
alessandrobenedetti commented on code in PR #3151: URL: https://github.com/apache/solr/pull/3151#discussion_r1943488899 ## solr/modules/llm/src/java/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessorFactory.java: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.llm.texttovector.update.processor; + +import org.apache.solr.common.SolrException; +import org.apache.solr.common.params.SolrParams; +import org.apache.solr.common.util.NamedList; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.response.SolrQueryResponse; +import org.apache.solr.schema.DenseVectorField; +import org.apache.solr.schema.FieldType; +import org.apache.solr.schema.SchemaField; +import org.apache.solr.update.processor.UpdateRequestProcessor; +import org.apache.solr.update.processor.UpdateRequestProcessorFactory; + +/** + * This class implements an UpdateProcessorFactory for the Text To Vector Update Processor. + */ +public class TextToVectorUpdateProcessorFactory extends UpdateRequestProcessorFactory { +private static final String INPUT_FIELD_PARAM = "inputField"; +private static final String OUTPUT_FIELD_PARAM = "outputField"; +private static final String MODEL_NAME = "model"; + +String inputField; +String outputField; +String modelName; +SolrParams params; Review Comment: Done! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]
alessandrobenedetti commented on code in PR #3151: URL: https://github.com/apache/solr/pull/3151#discussion_r1943553277 ## solr/modules/llm/src/java/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessor.java: ## @@ -0,0 +1,100 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.llm.texttovector.update.processor; + +import org.apache.solr.common.SolrException; +import org.apache.solr.common.SolrInputDocument; +import org.apache.solr.common.SolrInputField; +import org.apache.solr.llm.texttovector.model.SolrTextToVectorModel; +import org.apache.solr.llm.texttovector.store.rest.ManagedTextToVectorModelStore; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.update.AddUpdateCommand; +import org.apache.solr.update.processor.UpdateRequestProcessor; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.lang.invoke.MethodHandles; +import java.util.ArrayList; +import java.util.List; + + +class TextToVectorUpdateProcessor extends UpdateRequestProcessor { +private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); + +private final String inputField; +private final String outputField; +private final String model; +private SolrTextToVectorModel textToVector; +private ManagedTextToVectorModelStore modelStore = null; + +public TextToVectorUpdateProcessor( +String inputField, +String outputField, +String model, +SolrQueryRequest req, +UpdateRequestProcessor next) { +super(next); +this.inputField = inputField; +this.outputField = outputField; +this.model = model; +this.modelStore = ManagedTextToVectorModelStore.getManagedModelStore(req.getCore()); +} + +/** + * @param cmd the update command in input containing the Document to process + * @throws IOException If there is a low-level I/O error + */ +@Override +public void processAdd(AddUpdateCommand cmd) throws IOException { +this.textToVector = modelStore.getModel(model); Review Comment: I was debugging the flow to have a better understanding of the lifecycle of an update request processor. From what I see from the test, the factory instantiates a new update request processor every time a new update request is received. I think it's ok to keep it a class member, but let me see if I can move the instantiation to the factory. Ideally I wanted that to happen when the factory is initiate but It seems that the update request processor factory is not compatible with resource loading (as far as I debugged and checked) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]
alessandrobenedetti commented on code in PR #3151: URL: https://github.com/apache/solr/pull/3151#discussion_r1943565203 ## solr/modules/llm/src/java/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessor.java: ## @@ -0,0 +1,100 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.llm.texttovector.update.processor; + +import org.apache.solr.common.SolrException; +import org.apache.solr.common.SolrInputDocument; +import org.apache.solr.common.SolrInputField; +import org.apache.solr.llm.texttovector.model.SolrTextToVectorModel; +import org.apache.solr.llm.texttovector.store.rest.ManagedTextToVectorModelStore; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.update.AddUpdateCommand; +import org.apache.solr.update.processor.UpdateRequestProcessor; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.lang.invoke.MethodHandles; +import java.util.ArrayList; +import java.util.List; + + +class TextToVectorUpdateProcessor extends UpdateRequestProcessor { +private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); + +private final String inputField; +private final String outputField; +private final String model; +private SolrTextToVectorModel textToVector; +private ManagedTextToVectorModelStore modelStore = null; + +public TextToVectorUpdateProcessor( +String inputField, +String outputField, +String model, +SolrQueryRequest req, +UpdateRequestProcessor next) { +super(next); +this.inputField = inputField; +this.outputField = outputField; +this.model = model; +this.modelStore = ManagedTextToVectorModelStore.getManagedModelStore(req.getCore()); +} + +/** + * @param cmd the update command in input containing the Document to process + * @throws IOException If there is a low-level I/O error + */ +@Override +public void processAdd(AddUpdateCommand cmd) throws IOException { +this.textToVector = modelStore.getModel(model); +if (textToVector == null) { +throw new SolrException( +SolrException.ErrorCode.BAD_REQUEST, +"The model requested '" ++ model ++ "' can't be found in the store: " ++ ManagedTextToVectorModelStore.REST_END_POINT); +} + +SolrInputDocument doc = cmd.getSolrInputDocument(); +SolrInputField inputFieldContent = doc.get(inputField); +if (!isNullOrEmpty(inputFieldContent, doc, inputField)) { +String textToVectorise = inputFieldContent.getValue().toString();//add null checks and +float[] vector = textToVector.vectorise(textToVectorise); Review Comment: 1) @cpoerschke : I double checked and the langchain4j library 'embed' method (that's used in our 'vecctorise' method), doesn't return any exception, but I gree we should investigate what happens if that request fails (my best guess is we get an empty vector or null, I'll add that to tests) 2) @epugh : given that 'update.chain' is a parameter, if you configure a chain with no vector enrichment and a chain with vector enrichment, what prevents you from first index using the 'no vectors' chain and then slowly updating the index with atomic updates that add vectors (using the vector-chain)? We should double check and add to the documentation once we consolidate the code, what do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]
alessandrobenedetti commented on code in PR #3151: URL: https://github.com/apache/solr/pull/3151#discussion_r1943565203 ## solr/modules/llm/src/java/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessor.java: ## @@ -0,0 +1,100 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.llm.texttovector.update.processor; + +import org.apache.solr.common.SolrException; +import org.apache.solr.common.SolrInputDocument; +import org.apache.solr.common.SolrInputField; +import org.apache.solr.llm.texttovector.model.SolrTextToVectorModel; +import org.apache.solr.llm.texttovector.store.rest.ManagedTextToVectorModelStore; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.update.AddUpdateCommand; +import org.apache.solr.update.processor.UpdateRequestProcessor; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.lang.invoke.MethodHandles; +import java.util.ArrayList; +import java.util.List; + + +class TextToVectorUpdateProcessor extends UpdateRequestProcessor { +private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); + +private final String inputField; +private final String outputField; +private final String model; +private SolrTextToVectorModel textToVector; +private ManagedTextToVectorModelStore modelStore = null; + +public TextToVectorUpdateProcessor( +String inputField, +String outputField, +String model, +SolrQueryRequest req, +UpdateRequestProcessor next) { +super(next); +this.inputField = inputField; +this.outputField = outputField; +this.model = model; +this.modelStore = ManagedTextToVectorModelStore.getManagedModelStore(req.getCore()); +} + +/** + * @param cmd the update command in input containing the Document to process + * @throws IOException If there is a low-level I/O error + */ +@Override +public void processAdd(AddUpdateCommand cmd) throws IOException { +this.textToVector = modelStore.getModel(model); +if (textToVector == null) { +throw new SolrException( +SolrException.ErrorCode.BAD_REQUEST, +"The model requested '" ++ model ++ "' can't be found in the store: " ++ ManagedTextToVectorModelStore.REST_END_POINT); +} + +SolrInputDocument doc = cmd.getSolrInputDocument(); +SolrInputField inputFieldContent = doc.get(inputField); +if (!isNullOrEmpty(inputFieldContent, doc, inputField)) { +String textToVectorise = inputFieldContent.getValue().toString();//add null checks and +float[] vector = textToVector.vectorise(textToVectorise); Review Comment: 1) @cpoerschke : I double checked and the langchain4j library 'embed' method (that's used in our 'vectorise' method), doesn't return any exception, but I gree we should investigate what happens if that request fails (my best guess is we get an empty vector or null, I'll add that to tests) 2) @epugh : given that 'update.chain' is a parameter, if you configure a chain with no vector enrichment and a chain with vector enrichment, what prevents you from first index using the 'no vectors' chain and then slowly updating the index with atomic updates that add vectors (using the vector-chain)? We should double check and add to the documentation once we consolidate the code, what do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-17379) ParsingFieldUpdateProcessorsTest failures using CLDR locale provider
[ https://issues.apache.org/jira/browse/SOLR-17379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17924234#comment-17924234 ] ASF subversion and git services commented on SOLR-17379: Commit eb07d72af281ea426b8d44006636fcf81094b745 in solr's branch refs/heads/branch_9x from Houston Putman [ https://gitbox.apache.org/repos/asf?p=solr.git;h=eb07d72af28 ] SOLR-17379: Fix date parsing in Java 23, remove Lucene TestSecurityManager (#3154) * Fix system exit in test - by removing that part of the test (cherry picked from commit b779ed0590e36f69f7d1ce17e99dc936ab46752f) Co-authored-by: Chris Hostetter > ParsingFieldUpdateProcessorsTest failures using CLDR locale provider > > > Key: SOLR-17379 > URL: https://issues.apache.org/jira/browse/SOLR-17379 > Project: Solr > Issue Type: Test >Reporter: Chris M. Hostetter >Priority: Major > Labels: pull-request-available > Attachments: SOLR-17379.test-1.patch, SOLR-17379.test.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Background: https://lists.apache.org/thread/o7xwz8df6j0bx7w2m3w8ptrp4r7q957n > Test failures from {{ParsingFieldUpdateProcessorsTest.testAKSTZone}} and > {{ParsingFieldUpdateProcessorsTest.testParseFrenchDate}} are seemingly > guaranteed on JDK23, due to the removal of the {{COMPAT}} local provider > option. > On (some) earlier JDKs, these failures can be reproduced using... > {noformat} > ./gradlew test --tests ParsingFieldUpdateProcessorsTest > -Ptests.jvmargs="-Djava.locale.providers=CLDR -XX:TieredStopAtLevel=1 > -XX:+UseParallelGC -XX:ActiveProcessorCount=1 -XX:ReservedCodeCacheSize=120m" > {noformat} > ...to force the use off {{CLDR}} and exclude the use of {{COMPAT}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Resolved] (SOLR-17379) ParsingFieldUpdateProcessorsTest failures using CLDR locale provider
[ https://issues.apache.org/jira/browse/SOLR-17379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Houston Putman resolved SOLR-17379. --- Fix Version/s: 9.9 Assignee: Houston Putman Resolution: Fixed > ParsingFieldUpdateProcessorsTest failures using CLDR locale provider > > > Key: SOLR-17379 > URL: https://issues.apache.org/jira/browse/SOLR-17379 > Project: Solr > Issue Type: Test >Reporter: Chris M. Hostetter >Assignee: Houston Putman >Priority: Major > Labels: pull-request-available > Fix For: 9.9 > > Attachments: SOLR-17379.test-1.patch, SOLR-17379.test.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Background: https://lists.apache.org/thread/o7xwz8df6j0bx7w2m3w8ptrp4r7q957n > Test failures from {{ParsingFieldUpdateProcessorsTest.testAKSTZone}} and > {{ParsingFieldUpdateProcessorsTest.testParseFrenchDate}} are seemingly > guaranteed on JDK23, due to the removal of the {{COMPAT}} local provider > option. > On (some) earlier JDKs, these failures can be reproduced using... > {noformat} > ./gradlew test --tests ParsingFieldUpdateProcessorsTest > -Ptests.jvmargs="-Djava.locale.providers=CLDR -XX:TieredStopAtLevel=1 > -XX:+UseParallelGC -XX:ActiveProcessorCount=1 -XX:ReservedCodeCacheSize=120m" > {noformat} > ...to force the use off {{CLDR}} and exclude the use of {{COMPAT}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]
alessandrobenedetti commented on PR #3151: URL: https://github.com/apache/solr/pull/3151#issuecomment-2637867724 > I wanted to follow-up on my feedback to the LLM module concerning use of the word "embedding". I first tried to say that I was not familiar with the word, and your response was to remove it (completely?) from the module. If "embedding" is an appropriate word then use it. The documentation should reference it in the ref guide, even if just an "AKA". Embedding is widely used in the field, but it's a bit ambiguous and to be honest, I'm with you in not using any term that can cause confusion. Do you mean I added embedding in here somewhere? If that's the case, It's a mistake, point it to me and I'll remove it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]
alessandrobenedetti commented on code in PR #3151: URL: https://github.com/apache/solr/pull/3151#discussion_r1943565203 ## solr/modules/llm/src/java/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessor.java: ## @@ -0,0 +1,100 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.llm.texttovector.update.processor; + +import org.apache.solr.common.SolrException; +import org.apache.solr.common.SolrInputDocument; +import org.apache.solr.common.SolrInputField; +import org.apache.solr.llm.texttovector.model.SolrTextToVectorModel; +import org.apache.solr.llm.texttovector.store.rest.ManagedTextToVectorModelStore; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.update.AddUpdateCommand; +import org.apache.solr.update.processor.UpdateRequestProcessor; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.lang.invoke.MethodHandles; +import java.util.ArrayList; +import java.util.List; + + +class TextToVectorUpdateProcessor extends UpdateRequestProcessor { +private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); + +private final String inputField; +private final String outputField; +private final String model; +private SolrTextToVectorModel textToVector; +private ManagedTextToVectorModelStore modelStore = null; + +public TextToVectorUpdateProcessor( +String inputField, +String outputField, +String model, +SolrQueryRequest req, +UpdateRequestProcessor next) { +super(next); +this.inputField = inputField; +this.outputField = outputField; +this.model = model; +this.modelStore = ManagedTextToVectorModelStore.getManagedModelStore(req.getCore()); +} + +/** + * @param cmd the update command in input containing the Document to process + * @throws IOException If there is a low-level I/O error + */ +@Override +public void processAdd(AddUpdateCommand cmd) throws IOException { +this.textToVector = modelStore.getModel(model); +if (textToVector == null) { +throw new SolrException( +SolrException.ErrorCode.BAD_REQUEST, +"The model requested '" ++ model ++ "' can't be found in the store: " ++ ManagedTextToVectorModelStore.REST_END_POINT); +} + +SolrInputDocument doc = cmd.getSolrInputDocument(); +SolrInputField inputFieldContent = doc.get(inputField); +if (!isNullOrEmpty(inputFieldContent, doc, inputField)) { +String textToVectorise = inputFieldContent.getValue().toString();//add null checks and +float[] vector = textToVector.vectorise(textToVectorise); Review Comment: 1) @cpoerschke : I double checked and the langchain4j library 'embed' method (that's used in our 'vectorise' method) returns a RuntimeException . That's bad as it was not detected without investigating the internals of the code (I hate these practices). I'll give it a thought, any suggestion is welcome! 2) @epugh : given that 'update.chain' is a parameter, if you configure a chain with no vector enrichment and a chain with vector enrichment, what prevents you from first index using the 'no vectors' chain and then slowly updating the index with atomic updates that add vectors (using the vector-chain)? We should double check and add to the documentation once we consolidate the code, what do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org