Issues with rename command of the Collections API
Hi, I had tried to use the rename command of the collection API to rename a SolrCloud collection but I couldn't get it to work properly. I'm using Solr 8.11.2 and when I try to rename a collection called "test" to "test-new" with the following command: http://localhost:8983/solr/admin/collections?action=RENAME&name=test&target=test-new it creates an alias called "test" pointing to a collection "test-new" (which doesn't exists). Then, if I use for example the following queries: http://localhost:8983/solr/test/select?q=*%3A* http://localhost:8983/solr/test-new/select?q=*%3A* I receive the message "HTTP ERROR 404 Not Found" in both cases. And then, if I delete the alias "test" created by the previous rename command I can query the collection with the original name again with: http://localhost:8983/solr/test/select?q=*%3A* Has anyone used this command? Thanks in advance.
Fastest way to index data to solr
Hi, We are having nearly 70-80 millions of data which need to be indexed in solr 8.6.1. We want to choose between Java BInary format or direct JSON format. Our source data is DBMS which is a structured data. Regards Ravi
Re: Fastest way to index data to solr
Hi, If you want to index fast you shold * Make sure you have enough hardware on the solr side to handle the bulk load * Index with multiple threads on the client, experiment to find a good number based on the number of CPUs on receiving side * If using JAVA on client, use CloudSolrClient which is smart enough to send docs to correct shard * Do NOT commit during the bulk load, wait until the end * Experiemnt with batch size, e.g. try sending 500 docs in each update request, then 1000 etc until you find the best compromise * Use JavaBin if you can, it should be slightly faster than JSON, but probably not much * Remember that your RDBMS may be the bottleneck at the end of the day, how many rows can it deliver? You may need to partition the data set with SELECT ... WHERE clauses for each client to read in parallell. Jan > 29. sep. 2022 kl. 10:06 skrev Shankar R : > > Hi, > We are having nearly 70-80 millions of data which need to be indexed in > solr 8.6.1. > We want to choose between Java BInary format or direct JSON format. > Our source data is DBMS which is a structured data. > > Regards > Ravi
NullPointer Exception when using Cross Collection Join
Hi, Solr team. I'm using Solr 9.0.0 and when I query with Cross Collection Join on our own data. There is a NullPointer Exception. Error line: solr-9.0.0/solr/core/src/java/org/apache/solr/handler/export/ExportWriter.java#803 DocIdSetIterator it = new BitSetIterator(bits, 0); It seems like 'bits' is null and the constructor of BitSetIterator throws a NullPointer Exception. With deeper debugging, I found that the code assumes that the length of 'sets'(bits) and 'leaves' are equal(As shown by code below). However in my test, the last few elements of 'sets' could be null, which caused a NullPointer Exception. sets = (FixedBitSet[]) req.getContext().get("export"); List leaves = req.getSearcher().getTopReaderContext().leaves(); SegmentIterator[] segmentIterators = new SegmentIterator[leaves.size()]; for (int i = 0; i < segmentIterators.length; i++) { SortQueue sortQueue = new SortQueue(sizes[i], sortDoc.copy()); segmentIterators[i] = new SegmentIterator(bits[i], leaves.get(i), sortQueue, sortDoc.copy()); } Then I tried to skip creating a BitSetIterator object if bits == null, and it worked as expected. The query results were returned without data missing. But I still don't know if the Exception is expected or it is a bug. Hope to get your response, thanks a lot!
Conditional Joins in Solr
Is it possible to have a solr join query only apply under certain conditions? We have a solr document store that performs access control following various rules related to the data stored in solr. Consider the following scenario { Id:"doc1" linkedIDs:"doc2" Desc:"desc 1" Group:"1" } { Id:"doc2" Desc:"desc 2" Group:"2" } { Id:"doc3" Desc:"desc 3" Group:"3" } Suppose Internally for a given user we have a rule that says user cannot see anything with Group = "2". Therefore our system augments the following user specified query q=Id:* and translates that to q=Id:* AND !Group:2 This results in the search response containing: { Id:"doc1" linkedIDs:"doc2" Desc:"desc 1" Group:"1" } { Id:"doc3" Desc:"desc 3" Group:"3" } However I'd like to somehow leverage join to make it so the user will also not get back results that Link/reference products they can't see. Such that they'd only get back { Id:"doc3" Desc:"desc 3" Group:"3" } Is it possible to formulate a query like this?
NPE in collapse
Hello all, NPE in collapse hint=top_fc, When there are no segments, using hint=top_fc in collapse results in NPE. * query http://localhost:8983/solr/bukken/select?fq={!collapse field=str_field&hint=top_fc}&indent=true&q.op=OR&q=*:*&useParams= * response "error":{ "msg":"Cannot invoke \"org.apache.lucene.index.SortedDocValues.getValueCount()\" because \"this.collapseValues\" is null", "trace":"java.lang.NullPointerException: Cannot invoke \"org.apache.lucene.index.SortedDocValues.getValueCount()\" because \"this.collapseValues\" is null\n\tat org.apache.solr.search.CollapsingQParserPlugin$OrdScoreCollector.(CollapsingQParserPlugin.java:621)\n\tat org.apache.solr.search.CollapsingQParserPlugin$CollectorFactory.getCollector(CollapsingQParserPlugin.java:2125) . Has this issue already been discussed? Minami Takuya
Hadoop vulnerability in Solr 8.11.2 from scan
Hi, Our vulnerability scanning tool found a vulnerability from Hadoop in Solr 8.11.2. More specifically, it is introduced through org.apache.solr:solr-core@8.11.2 › org.apache.hadoop:hadoop-common@3.2.2. The published vulnerability is listed as CVE-2022-25168: https://lists.apache.org/thread/mxqnb39jfrwgs3j6phwvlrfq4mlox130 This vulnerability is not listed on Solr Security News, but also not under the false positives on the SolrSecurity Confluence page. We were wondering if this is a real vulnerability for Solr and if in particular Solr 8.11.2 is affected by this vulnerability? Thanks in advance. Kind regards, Richard
Re: Fastest way to index data to solr
> On Sep 29, 2022, at 4:17 AM, Jan Høydahl wrote: > > * Index with multiple threads on the client, experiment to find a good number > based on the number of CPUs on receiving side That may also mean having multiple clients. We went from taking about 8 hours to index our entire 42M rows to about 1.5 hours because we ran 10 indexer clients at once. Each indexer takes roughly 1/10th of the data and churns away. We don't have any of the clients do a commit. After the indexers are done, we run one more time through the queue with a commit at the end. As Jan says, make sure it's not your database that is the bottleneck, and experiment with how many clients you want to have going at once. Andy
Re: Fastest way to index data to solr
Another way to handle this is have your indexing code fork out to as many cores as the solr indexing server has. It’s way less work to force the code to run itself that many times in parallel, and as long as your sql queries and said tables are properly indexed the database shouldn’t be a bottle neck, just need to make sure the indexing server has the resources needed, since obviously you never index you a query server. It’s just a copy and tuned different than the indexer for fast reads, not writes. > On Sep 29, 2022, at 2:21 PM, Andy Lester wrote: > > > >> On Sep 29, 2022, at 4:17 AM, Jan Høydahl wrote: >> >> * Index with multiple threads on the client, experiment to find a good >> number based on the number of CPUs on receiving side > > That may also mean having multiple clients. We went from taking about 8 hours > to index our entire 42M rows to about 1.5 hours because we ran 10 indexer > clients at once. Each indexer takes roughly 1/10th of the data and churns > away. We don't have any of the clients do a commit. After the indexers are > done, we run one more time through the queue with a commit at the end. > > As Jan says, make sure it's not your database that is the bottleneck, and > experiment with how many clients you want to have going at once. > > Andy
Re: Conditional Joins in Solr
Hi, Jason. Could it be something like q=id:* -Group:2 -{!join from=id to=linkedIDs}Group:2 ? On Thu, Sep 29, 2022 at 7:47 PM Kahler, Jason J (US) wrote: > Is it possible to have a solr join query only apply under certain > conditions? We have a solr document store that performs access control > following various rules related to the data stored in solr. Consider the > following scenario > > > { > Id:"doc1" > linkedIDs:"doc2" > Desc:"desc 1" > Group:"1" > } > { > Id:"doc2" > Desc:"desc 2" > Group:"2" > } > { > Id:"doc3" > Desc:"desc 3" > Group:"3" > } > > Suppose Internally for a given user we have a rule that says user cannot > see anything with Group = "2". Therefore our system augments the following > user specified query q=Id:* and translates that to q=Id:* AND !Group:2 > > This results in the search response containing: > { > Id:"doc1" > linkedIDs:"doc2" > Desc:"desc 1" > Group:"1" > } > { > Id:"doc3" > Desc:"desc 3" > Group:"3" > } > > However I'd like to somehow leverage join to make it so the user will also > not get back results that Link/reference products they can't see. Such that > they'd only get back > > { > Id:"doc3" > Desc:"desc 3" > Group:"3" > } > > > Is it possible to formulate a query like this? > > -- Sincerely yours Mikhail Khludnev
Re: NullPointer Exception when using Cross Collection Join
Hi, Sean. It's not clear if it can be reproduced with bare solr dstro install, indexing a few docs and querying it; or it's something about hacking/customising it as a library? On Thu, Sep 29, 2022 at 7:42 PM Sean Wu wrote: > Hi, Solr team. > > I'm using Solr 9.0.0 and when I query with Cross Collection Join on > our own data. There is a NullPointer Exception. > > Error line: > solr-9.0.0/solr/core/src/java/org/apache/solr/handler/export/ExportWriter.java#803 > > DocIdSetIterator it = new BitSetIterator(bits, 0); > > It seems like 'bits' is null and the constructor of BitSetIterator > throws a NullPointer Exception. > > > With deeper debugging, I found that the code assumes that the length > of 'sets'(bits) and 'leaves' are equal(As shown by code below). > However in my test, the last few elements of 'sets' could be null, > which caused a NullPointer Exception. > > sets = (FixedBitSet[]) req.getContext().get("export"); > > List leaves = > req.getSearcher().getTopReaderContext().leaves(); > > SegmentIterator[] segmentIterators = new SegmentIterator[leaves.size()]; > for (int i = 0; i < segmentIterators.length; i++) { > SortQueue sortQueue = new SortQueue(sizes[i], sortDoc.copy()); > segmentIterators[i] = > new SegmentIterator(bits[i], leaves.get(i), sortQueue, > sortDoc.copy()); > } > > > Then I tried to skip creating a BitSetIterator object if bits == null, > and it worked as expected. The query results were returned without > data missing. > > But I still don't know if the Exception is expected or it is a bug. > Hope to get your response, thanks a lot! > -- Sincerely yours Mikhail Khludnev
Re: NPE in collapse
What version of Solr are you using? Try removing the top_fc hint, does the error still occur? Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Sep 29, 2022 at 12:47 PM 南拓弥 wrote: > Hello all, > > NPE in collapse hint=top_fc, > When there are no segments, using hint=top_fc in collapse results in NPE. > > * query > http://localhost:8983/solr/bukken/select?fq={!collapse > field=str_field&hint=top_fc}&indent=true&q.op=OR&q=*:*&useParams= > > * response > "error":{ > "msg":"Cannot invoke > \"org.apache.lucene.index.SortedDocValues.getValueCount()\" because > \"this.collapseValues\" is null", > "trace":"java.lang.NullPointerException: Cannot invoke > \"org.apache.lucene.index.SortedDocValues.getValueCount()\" because > \"this.collapseValues\" is null\n\tat > > org.apache.solr.search.CollapsingQParserPlugin$OrdScoreCollector.(CollapsingQParserPlugin.java:621)\n\tat > > org.apache.solr.search.CollapsingQParserPlugin$CollectorFactory.getCollector(CollapsingQParserPlugin.java:2125) > . > > Has this issue already been discussed? > > Minami Takuya >
Re: NPE in collapse
Oh, there are no segments... If this error is still occurring in the latest Solr version without top_fc hint then it's a bug. Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Sep 29, 2022 at 3:27 PM Joel Bernstein wrote: > What version of Solr are you using? > > Try removing the top_fc hint, does the error still occur? > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > > On Thu, Sep 29, 2022 at 12:47 PM 南拓弥 wrote: > >> Hello all, >> >> NPE in collapse hint=top_fc, >> When there are no segments, using hint=top_fc in collapse results in NPE. >> >> * query >> http://localhost:8983/solr/bukken/select?fq={!collapse >> field=str_field&hint=top_fc}&indent=true&q.op=OR&q=*:*&useParams= >> >> * response >> "error":{ >> "msg":"Cannot invoke >> \"org.apache.lucene.index.SortedDocValues.getValueCount()\" because >> \"this.collapseValues\" is null", >> "trace":"java.lang.NullPointerException: Cannot invoke >> \"org.apache.lucene.index.SortedDocValues.getValueCount()\" because >> \"this.collapseValues\" is null\n\tat >> >> org.apache.solr.search.CollapsingQParserPlugin$OrdScoreCollector.(CollapsingQParserPlugin.java:621)\n\tat >> >> org.apache.solr.search.CollapsingQParserPlugin$CollectorFactory.getCollector(CollapsingQParserPlugin.java:2125) >> . >> >> Has this issue already been discussed? >> >> Minami Takuya >> >
Re: NullPointer Exception when using Cross Collection Join
Can you share the stack trace? Also in the Solr log there will be a call to the /export handler. Can you get that from the log? Then we can isolate the call to the export handler and see if we can reproduce it. Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Sep 29, 2022 at 3:01 PM Mikhail Khludnev wrote: > Hi, Sean. > It's not clear if it can be reproduced with bare solr dstro install, > indexing a few docs and querying it; or it's something about > hacking/customising it as a library? > > On Thu, Sep 29, 2022 at 7:42 PM Sean Wu wrote: > > > Hi, Solr team. > > > > I'm using Solr 9.0.0 and when I query with Cross Collection Join on > > our own data. There is a NullPointer Exception. > > > > Error line: > > > solr-9.0.0/solr/core/src/java/org/apache/solr/handler/export/ExportWriter.java#803 > > > > DocIdSetIterator it = new BitSetIterator(bits, 0); > > > > It seems like 'bits' is null and the constructor of BitSetIterator > > throws a NullPointer Exception. > > > > > > With deeper debugging, I found that the code assumes that the length > > of 'sets'(bits) and 'leaves' are equal(As shown by code below). > > However in my test, the last few elements of 'sets' could be null, > > which caused a NullPointer Exception. > > > > sets = (FixedBitSet[]) req.getContext().get("export"); > > > > List leaves = > > req.getSearcher().getTopReaderContext().leaves(); > > > > SegmentIterator[] segmentIterators = new SegmentIterator[leaves.size()]; > > for (int i = 0; i < segmentIterators.length; i++) { > > SortQueue sortQueue = new SortQueue(sizes[i], sortDoc.copy()); > > segmentIterators[i] = > > new SegmentIterator(bits[i], leaves.get(i), sortQueue, > > sortDoc.copy()); > > } > > > > > > Then I tried to skip creating a BitSetIterator object if bits == null, > > and it worked as expected. The query results were returned without > > data missing. > > > > But I still don't know if the Exception is expected or it is a bug. > > Hope to get your response, thanks a lot! > > > > > -- > Sincerely yours > Mikhail Khludnev >
Re: Fastest way to index data to solr
> > * Do NOT commit during the bulk load, wait until the end > Unless something changed this is slightly risky. It can lead to very large transaction logs and very long playback of the tx log on startup. If Solr goes down during indexing to something like an OOM, it could take a very long time for it to restart, likely leading to people restarting it because they think it's stuck... and then it has to start at the beginning again... (Ref: https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/) ... Infrequent commits might be better than none (but definitely you do not want not frequent commits, certainly not after every batch, or even worse after every doc). -Gus
Re: Fastest way to index data to solr
On 9/29/22 22:28, Gus Heck wrote: * Do NOT commit during the bulk load, wait until the end Unless something changed this is slightly risky. It can lead to very large transaction logs and very long playback of the tx log on startup. It is always good practice to have autoCommit configured with openSearcher set to false and a relatively low maxTime value. I believe the configs that Solr ships with set this to 15 seconds (actual value being 15000 milliseconds), but I prefer making it 60 seconds just so there is less overall stress on the system. That setting will eliminate the problem with huge transaction logs. I believe this is discussed on that Lucidworks article that you linked. A commit that opens a new searcher should be done at the end of the major indexing job. I would do this as a soft commit, but there's nothing wrong with making it a hard commit that has openSearcher set to true. On large indexing jobs there is likely to be little difference in performance between the two types of commit. Thanks, Shawn
Re: Fastest way to index data to solr
70 million can be a lot or a little. Doc count is not even half the story. How much storage space do these documents occupy in the database? Is the text tweet sized, or multi-megabyte sized clobs, or links files on a file store that need to be fetched and parsed (or OCR'd or converted from audio/video to transcripts)? IOT type docs with very minimal text can be indexed much faster than 50 page pdf documents. With very large clusters and indexing systems distributing work across a spark cluster I've seen as high as 1.3M docs/sec... and 70M would be trivial for that system (they had hundreds of billions). But text documents are typically much, much slower than that, especially if the text must be extracted from dirty formats such as pdf or word data, or complex custom analysis is involved, or additional fetching of files or data to merge into the doc is required. As for the two formats: If you are indexing with java code, choose Java Binary. If you are using a non-java language you can use JSON. The rare case of JSON from java would be if your data was already in JSON format... then it depends on whether solr is limiting you (do work on the indexers and use java bin so it has less parsing to do) or your indexing machines are limiting you (use JSON so your indexers don't have to do the conversion). Like many things in search "It depends" :) On Thu, Sep 29, 2022 at 4:07 AM Shankar R wrote: > Hi, > We are having nearly 70-80 millions of data which need to be indexed in > solr 8.6.1. > We want to choose between Java BInary format or direct JSON format. > Our source data is DBMS which is a structured data. > > Regards > Ravi > -- http://www.needhamsoftware.com (work) http://www.the111shift.com (play)