Re: Strategies for Real-Time Data Updates in Solr Without Compromising Latency
Hi, here are a few considerations. You can try to update docvalues only the price column as whole (it's worth to update as many docs as possible). https://solr.apache.org/guide/solr/latest/indexing-guide/partial-document-updates.html#in-place-updates Note: docValues only fields are searchable. This approach ignores caches (can't make them useful) Another idea is to extract updatable fields into a separate index/cores, update separately and join them to the main docs on the every request. This requires https://solr.apache.org/guide/solr/latest/query-guide/join-query-parser.html#joining-multiple-shard-collections available since 9.3. Here's the problem that committing price core changes makes join query entries in the main core caches obsolete and evicted, it impacts query times. However, there's an idea and implementation to warm "to-side" join index caches https://issues.apache.org/jira/browse/SOLR-16242 On Fri, Aug 25, 2023 at 9:16 PM Neeraj giri wrote: > Greetings fellow forum members, > > Our team is currently working with Solr 8.11 in cloud mode to power our > search system, built using Java Spring at the application layer. We're > facing a challenge in maintaining up-to-date pricing information for our > ecommerce platform, which experiences frequent data changes throughout the > day. While attempting to achieve real-time data updates, we've encountered > issues related to Solr's latency and overall system performance. > > As of now, we've implemented a process that halts data writes during the > day. Instead, we retrieve updated pricing data from a separate microservice > that maintains a cached and current version of the information. However, we > believe this approach isn't ideal due to its potential impact on system > efficiency. > > We're seeking guidance on designing an architecture that can seamlessly > handle real-time updates to our Solr index without compromising the search > latency that our users expect. Writing directly to Solr nodes appears to > increase read latency, which is a concern for us. Our goal is to strike a > balance between keeping our pricing information up-to-date and maintaining > an acceptable level of system responsiveness. > > We would greatly appreciate any insights, strategies, or best practices > from the community that can help us tackle this challenge. How can we > optimize our approach to real-time data updates while ensuring Solr's > latency remains within acceptable limits? Any advice or suggestions on > architecture, techniques, or tools would be invaluable. > > Thank you in advance for your expertise and assistance. > > Regards, > > Neeraj giri > -- Sincerely yours Mikhail Khludnev
Re: Re-index after upgrade
Are you sure that 9.x refuse to open an index first created in 7.x? I thought that strict policy was only needed in 8.0 due to a particular lossy data structure change, and that 9.x is more lenient? Jan > 24. aug. 2023 kl. 22:38 skrev Shawn Heisey : > > Version 9 behaves the same as 8. If the index was ever touched by a version > before 8.0, version 9 will not read the index.
Re: solrj client memory leak via ThreadLocal in solrj.impl.Http2SolrClient?
Hi Tim, have you figured out the problem? Just curious to know what you have done at the end. On Fri, Aug 25, 2023 at 4:48 PM Vincenzo D'Amore wrote: > Just my 2 cent:, I have always used solr clients as singletons. You have > to instantiate them only once and reuse them forever. > > On Fri, 25 Aug 2023 at 15:35, Tim Funk wrote: > >> Update - It looks like the ThreadLocal leak is different and unrelated to >> creating / closing a new Http2SolrClient every request. Even using a >> shared >> Http2SolrClient for my webapp - I noticed the same issue in a QA >> environment >> of leaking ThreadLocals. Falling back to HttpSolrClient optimistically is >> the fix so far. >> >> Client is OpenJDK 11.0.17 >> >> -Tim >> >> On Wed, Aug 23, 2023 at 9:46 AM Tim Funk wrote: >> >> > Cool - For now I'll either revert to HttpSolrClient or use a single >> client >> > (depending >> > on what I have to refactor) >> > >> > My only concern with a shared client is if one calls close() >> "accidently", >> > i don't >> > see an easy way to query the client to see if it was closed so I can >> > destroy it >> > and create a new one. (Without resorting to an webapp restart) >> > >> > -Tim >> > >> > On Tue, Aug 22, 2023 at 6:42 PM Shawn Heisey >> wrote: >> > >> >> >> >> That kind of try-with-resources approach should take care of the >> >> problem, because it would run the close() method on the SolrClient >> object. >> >> >> >> The classes in the error are Jetty classes. This probably means that >> >> the problem is in Jetty, but I couldn't guarantee that. >> >> >> >> You do not need multiple client objects just because you have multiple >> >> cores. You only need one Http2SolrClient object per hostname:port >> >> combination used to access Solr, and you should only need to create >> them >> >> when the application starts and close them when the application ends. >> >> >> >> >> > -- Vincenzo D'Amore
Re: solrj client memory leak via ThreadLocal in solrj.impl.Http2SolrClient?
I reverted to HttpSolrClient. That seems to have plugged the leak. As for root cause, I haven't had time to dig farther. Since this happens regardless of reusing SolrClient vs instantiating a new one, I'm hoping that's a data point of interest. But as for constructing a "simple" test to reproduce, I'm not sure if I'll find the time in the near future to do other $work priorities. As for future triage, I'd try the any of the following - Change my endpoint and use Http2 ( disable: builder.useHttp1_1(true)) - Revert to Http2Client and add a timer / logger in existing apps servers counting threadlocals and look for patterns - Write a standalone client, single thread. See if I can count the threadlocals over time. - Write a standalone client - Make all executions in new different threads with occasional reuse of thread -Tim On Mon, Aug 28, 2023 at 7:17 AM Vincenzo D'Amore wrote: > Hi Tim, have you figured out the problem? Just curious to know what you > have done at the end. > > On Fri, Aug 25, 2023 at 4:48 PM Vincenzo D'Amore > wrote: > > > Just my 2 cent:, I have always used solr clients as singletons. You have > > to instantiate them only once and reuse them forever. > > > >
Re: solrj client memory leak via ThreadLocal in solrj.impl.Http2SolrClient?
Hi Tim, thanks for letting me know, I experienced the same problem, my application became unstable and crashed. My first implementation was very similar to yours and relied heavily on try-with-resources java statements with CloudSolrClient. As said in my previous email, I ended up using the solr clients as singletons reusing one instance per solr instance/collection. On Mon, Aug 28, 2023 at 1:48 PM Tim Funk wrote: > I reverted to HttpSolrClient. That seems to have plugged the leak. > As for root cause, I haven't had time to dig farther. Since this happens > regardless of reusing SolrClient vs instantiating a new one, I'm hoping > that's a data point of interest. But as for constructing a "simple" test > to reproduce, I'm not sure if I'll find the time in the near future to do > other $work priorities. > > As for future triage, I'd try the any of the following > - Change my endpoint and use Http2 ( disable: builder.useHttp1_1(true)) > - Revert to Http2Client and add a timer / logger in existing apps servers > counting threadlocals and look for patterns > - Write a standalone client, single thread. See if I can count the > threadlocals over time. > - Write a standalone client - Make all executions in new different threads > with occasional reuse of thread > > -Tim > > > On Mon, Aug 28, 2023 at 7:17 AM Vincenzo D'Amore > wrote: > > > Hi Tim, have you figured out the problem? Just curious to know what you > > have done at the end. > > > > On Fri, Aug 25, 2023 at 4:48 PM Vincenzo D'Amore > > wrote: > > > > > Just my 2 cent:, I have always used solr clients as singletons. You > have > > > to instantiate them only once and reuse them forever. > > > > > > > > -- Vincenzo D'Amore
Re: Re-index after upgrade
On 8/28/23 05:03, Jan Høydahl wrote: Are you sure that 9.x refuse to open an index first created in 7.x? I thought that strict policy was only needed in 8.0 due to a particular lossy data structure change, and that 9.x is more lenient? I haven't actually tried it, but I believe N-1 is enforced for all versions starting with 8.0, including 9.x and 10.0.0-SNAPSHOT. That would need to be verified by someone who is more familiar with Lucene than I am. Thanks, Shawn
Re: Weird issue -- pulling results with cursorMark gets fewer documents than numFound
: Schema meets the requirements for Atomic Update, so we are doing a migration : by querying the old cluster and writing to the new cluster. We are doing it in : batches by filtering on one of the fields, and using cursorMark to efficiently : page through the results. ... : The query thread gets batches of 1 documents and dumps them on a ... : One of the batches always indexes 5 fewer documents than numFound. It's : consistent -- always 5 documents. Updates are paused during the migration. : On the last run, numFound for this batch was 3824942 and the indexed count was : 3824937. I assume you mean one of the batches always indexes 5 fewer documents then 'rows=N' param (ie: the query batch size) ... correct? You're talking about the total numFound being higher then the index count? : The other idea I have is that there could be a uniqueKey value that appears in : more than one shard. This doesn't seem likely, as the compositeId router Also possible is that sme shards are out of sync with their leader -- ie: for some shardX, replica1 has a doc that replica2 doesn't, and replica1 is used for the initial phase of the request to get the "top N sorted doc uniqueKey at cursorMark=ZZZ" but replica2 is used in the second phase to fetch all of the field values. (but if that were the case, you'd expect that at least some of the time you'd get "lucky" and the two phases would both hit replicas that agreeed with eachother -- even if they didn't agree with the leader -- and the problem wouldn't reliably reproduce every time) : should keep that from happening. Is there a way to detect this situation? I I would log every cursorMark request URL and the number of docs in the response. If, at the end of the run, you see a cursorMark value that didn't return the same number of docs as your rows param (ignoring the last batch which you expect to be smaller) then go manually re-run that query against every replica of every shard using `distrib=false` and diff the responses from each replica of the same shard -Hoss http://www.lucidworks.com/
Re: Re-index after upgrade
Yep, that check is present in Lucene 9.x as well. It will refuse to open an index created in 7.x. https://github.com/apache/lucene/blob/releases/lucene/9.4.0/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L345 https://github.com/apache/lucene/blob/releases/lucene/9.4.0/lucene/core/src/java/org/apache/lucene/util/Version.java#L262 -Rahul On Mon, Aug 28, 2023 at 2:31 PM Shawn Heisey wrote: > On 8/28/23 05:03, Jan Høydahl wrote: > > Are you sure that 9.x refuse to open an index first created in 7.x? I > thought that strict policy was only needed in 8.0 due to a particular lossy > data structure change, and that 9.x is more lenient? > > I haven't actually tried it, but I believe N-1 is enforced for all > versions starting with 8.0, including 9.x and 10.0.0-SNAPSHOT. That > would need to be verified by someone who is more familiar with Lucene > than I am. > > Thanks, > Shawn >
Re: Weird issue -- pulling results with cursorMark gets fewer documents than numFound
On 8/28/23 11:42, Chris Hostetter wrote: I assume you mean one of the batches always indexes 5 fewer documents then 'rows=N' param (ie: the query batch size) ... correct? You're talking about the total numFound being higher then the index count? The query uses rows=1, which is configurable via a commandline option. The source collection's numFound is 5 higher than the number of documents indexed to the target. I was assured that all updates to the source collection were paused during the most recent migration test. Also possible is that sme shards are out of sync with their leader -- ie: for some shardX, replica1 has a doc that replica2 doesn't, and replica1 is used for the initial phase of the request to get the "top N sorted doc uniqueKey at cursorMark=ZZZ" but replica2 is used in the second phase to fetch all of the field values. (but if that were the case, you'd expect that at least some of the time you'd get "lucky" and the two phases would both hit replicas that agreeed with eachother -- even if they didn't agree with the leader -- and the problem wouldn't reliably reproduce every time) We did make sure that the numDocs was the same on all replicas for each shard. A comprehensive check of ID values across replicas has not been done. I should be able to write a program to do that. : should keep that from happening. Is there a way to detect this situation? I I would log every cursorMark request URL and the number of docs in the response. It has been verified that each cursorMark batch is 1 docs except the last batch, by checking the size of the SolrDocumentList object retrieved from the response. Added some debug-level logging to show that along with the cursorMark value. I have finished my SolrJ program using Http2SolrClient that will look for IDs that exist in more than one shard. I had hoped to have it get the list of core URLs from ZK, but couldn't figure that out, so now the commandline options accept multiple core-specific URLs, with the idea that one replica core from each shard will be presented. I have tested it against my little Solr install, with the first URL pointing at the collection alias and the second pointing at the real core. It's a single-shard collection on a single node. As expected, it reported that every ID was duplicated. We'll try it for real in the wee hours of the morning. I put the program on github if anyone is interested in taking a look. https://github.com/elyograg/shard_duplicate_finder Thanks, Shawn
Re: Re-index after upgrade
Ok, thanks for the clarification. Jan > 28. aug. 2023 kl. 20:43 skrev Rahul Goswami : > > Yep, that check is present in Lucene 9.x as well. It will refuse to open an > index created in 7.x. > > https://github.com/apache/lucene/blob/releases/lucene/9.4.0/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L345 > > https://github.com/apache/lucene/blob/releases/lucene/9.4.0/lucene/core/src/java/org/apache/lucene/util/Version.java#L262 > > -Rahul > > On Mon, Aug 28, 2023 at 2:31 PM Shawn Heisey wrote: > >> On 8/28/23 05:03, Jan Høydahl wrote: >>> Are you sure that 9.x refuse to open an index first created in 7.x? I >> thought that strict policy was only needed in 8.0 due to a particular lossy >> data structure change, and that 9.x is more lenient? >> >> I haven't actually tried it, but I believe N-1 is enforced for all >> versions starting with 8.0, including 9.x and 10.0.0-SNAPSHOT. That >> would need to be verified by someone who is more familiar with Lucene >> than I am. >> >> Thanks, >> Shawn >>
Registration open for Community Over Code North America
Hello! Registration is still open for the upcoming Community Over Code NA event in Halifax, NS! We invite you to register for the event https://communityovercode.org/registration/ Apache Committers, note that you have a special discounted rate for the conference at US$250. To take advantage of this rate, use the special code sent to the committers@ list by Brian Proffitt earlier this month. If you are in need of an invitation letter, please consult the information at https://communityovercode.org/visa-letter/ Please see https://communityovercode.org/ for more information about the event, including how to make reservations for discounted hotel rooms in Halifax. Discounted rates will only be available until Sept. 5, so reserve soon! --Rich, for the event planning team