Re: dataimport problem
Hi Scott It should WARN into org.apache.solr.handler.dataimport.SolrWriter Check that this log category is enabled, and check logs for it. see https://github.com/SearchScale/dataimporthandler/blob/branch_9x/src/main/java/org/apache/solr/handler/dataimport/SolrWriter.java#L82C7-L82C10 On Sun, Sep 3, 2023 at 12:44 AM Scott Derrick wrote: > This import has been running everyday for years. Recently I noticed > that 1 file was not being imported. > > When I run the commend I see the following > > { >"responseHeader":{ > "status":0, > "QTime":1}, >"initArgs":[ > "defaults",[ >"config","tei-config.xml", >"df","_text_"]], >"command":"full-import", >"status":"idle", >"importResponse":"", >"statusMessages":{ > "Total Requests made to DataSource":"0", > "Total Rows Fetched":"68432", > "Total Documents Processed":"4887", > "Total Documents Skipped":"0", > "Full Dump Started":"2023-09-02 16:47:52", > "":"Indexing completed. Added/Updated: 4887 documents. Deleted 0 > documents.", > "Committed":"2023-09-02 16:51:41", > "Total Documents Failed":"1", > "Time taken":"0:3:49.353"}} > > the line "Total Documents Failed":"1" is the problem, > > I'm having a separate issue on a specific search return and I think it > is related to this one document. I've looked in the solr logs and don't > see any reference to a failed document on import? > > How do I find out what document is failing on import? I tried passing > &debug=true > > curl > ' > http://localhost:8983/solr/mbepp/update/tei?command=full-import&clean=true&debug=true > ' > > but then it only processes 10 documents and stops? > > very frustrating. > > Scott > > > > -- Sincerely yours Mikhail Khludnev
SolrJ 9.3 timeout when using http2, works with http
Hello, what can cause this issue ? This times out solrServer = new Http2SolrClient.Builder(solrUrl) .withRequestTimeout(SOLR_TIMEOUT_MINUTES, TimeUnit.MINUTES) .withConnectionTimeout(SOLR_TIMEOUT_MINUTES, TimeUnit.MINUTES) .withIdleTimeout(SOLR_TIMEOUT_MINUTES, TimeUnit.MINUTES) .build(); This works well solrServer = new Http2SolrClient.Builder(solrUrl) .useHttp1_1(true) .withRequestTimeout(SOLR_TIMEOUT_MINUTES, TimeUnit.MINUTES) .withConnectionTimeout(SOLR_TIMEOUT_MINUTES, TimeUnit.MINUTES) .withIdleTimeout(SOLR_TIMEOUT_MINUTES, TimeUnit.MINUTES) .build(); Just upgraded from 8.11 to 9.3 but always used older http client before. Thanks — Andrea Vettori
RE: Strategies for Real-Time Data Updates in Solr Without Compromising Latency
Hello can you please explain the problem with a few numbers ? We’re using solr as the backend for our e-commerce platform and update it several times a day (around 4 times per hour) and don’t have any issue. It may depend on system size, concurrent searches etc… so if you have a few numbers it would be easier to give you some suggestions. On 2023/08/25 18:16:02 Neeraj giri wrote: > Greetings fellow forum members, > > Our team is currently working with Solr 8.11 in cloud mode to power our > search system, built using Java Spring at the application layer. We're > facing a challenge in maintaining up-to-date pricing information for our > ecommerce platform, which experiences frequent data changes throughout the > day. While attempting to achieve real-time data updates, we've encountered > issues related to Solr's latency and overall system performance. > > As of now, we've implemented a process that halts data writes during the > day. Instead, we retrieve updated pricing data from a separate microservice > that maintains a cached and current version of the information. However, we > believe this approach isn't ideal due to its potential impact on system > efficiency. > > We're seeking guidance on designing an architecture that can seamlessly > handle real-time updates to our Solr index without compromising the search > latency that our users expect. Writing directly to Solr nodes appears to > increase read latency, which is a concern for us. Our goal is to strike a > balance between keeping our pricing information up-to-date and maintaining > an acceptable level of system responsiveness. > > We would greatly appreciate any insights, strategies, or best practices > from the community that can help us tackle this challenge. How can we > optimize our approach to real-time data updates while ensuring Solr's > latency remains within acceptable limits? Any advice or suggestions on > architecture, techniques, or tools would be invaluable. > > Thank you in advance for your expertise and assistance. > > Regards, > > Neeraj giri > — Ing. Andrea Vettori Sistemi Informativi B2BIres s.r.l.
Re: Strategies for Real-Time Data Updates in Solr Without Compromising Latency
On 8/25/23 12:16, Neeraj giri wrote: Our team is currently working with Solr 8.11 in cloud mode to power our search system, built using Java Spring at the application layer. We're facing a challenge in maintaining up-to-date pricing information for our ecommerce platform, which experiences frequent data changes throughout the day. While attempting to achieve real-time data updates, we've encountered issues related to Solr's latency and overall system performance. What are the issues? Be as detailed as you can be, include complete error messages. Which exact Solr 8.11 version are you running? There are three released versions that start with 8.11. As of now, we've implemented a process that halts data writes during the day. Instead, we retrieve updated pricing data from a separate microservice that maintains a cached and current version of the information. However, we believe this approach isn't ideal due to its potential impact on system efficiency. There's not much information here about what your microservice actually does or what problems you have had with it. We're seeking guidance on designing an architecture that can seamlessly handle real-time updates to our Solr index without compromising the search latency that our users expect. Writing directly to Solr nodes appears to increase read latency, which is a concern for us. Our goal is to strike a balance between keeping our pricing information up-to-date and maintaining an acceptable level of system responsiveness. What exactly does "writing directly to Solr nodes" mean, and what is the alternative? Your message includes very few details. The details are needed to provide a solution. Thanks, Shawn
Re: SolrJ 9.3 timeout when using http2, works with http
On 9/3/23 04:28, Ing. Andrea Vettori wrote: what can cause this issue ? This times out solrServer = new Http2SolrClient.Builder(solrUrl) .withRequestTimeout(SOLR_TIMEOUT_MINUTES, TimeUnit.MINUTES) .withConnectionTimeout(SOLR_TIMEOUT_MINUTES, TimeUnit.MINUTES) .withIdleTimeout(SOLR_TIMEOUT_MINUTES, TimeUnit.MINUTES) .build(); What is the value of SOLR_TIMEOUT_MINUTES? This may not be super relevant, but I am curious. The connection timeout should be something between 5 and 15 seconds. TCP connections establish pretty quickly, even if the destination is in a tiny island country with nothing but satellite Internet. If you haven't gotten a connection established within 10-15 seconds, it's probably never going to connect. If this connection is on a LAN, 5 seconds would be an eternity. I would suggest switching the timeouts to seconds and using more than one constant. What is the precise error? A Java exception can be dozens of lines in length ... be sure to include all of it. What is the Solr version on the server? Older Solr versions do not work well with http2. The workaround for those issues is to use http1.1. Here's a bug that MIGHT be applicable, fixed in Solr 9.2.0: https://issues.apache.org/jira/browse/SOLR-16099 The underlying issue is one or more bugs in Jetty. One of the MANY fixes in 9.2.0 was an upgraded Jetty version, both server and client. If the server side is also 9.3.0, then http2 should work well, and I do not know what might be wrong. The exception with stacktrace might provide a clue. Thanks, Shawn
Re: dataimport problem
On 9/2/23 13:38, Scott Derrick wrote: How do I find out what document is failing on import? Presumably you have primary keys on both sides. Fetch them and compare the lists. Dima
Re: SolrJ 9.3 timeout when using http2, works with http
> On 3 Sep 2023, at 17:30, Shawn Heisey wrote: > > On 9/3/23 04:28, Ing. Andrea Vettori wrote: >> what can cause this issue ? >> This times out >> solrServer = new Http2SolrClient.Builder(solrUrl) >> .withRequestTimeout(SOLR_TIMEOUT_MINUTES, >> TimeUnit.MINUTES) >> .withConnectionTimeout(SOLR_TIMEOUT_MINUTES, >> TimeUnit.MINUTES) >> .withIdleTimeout(SOLR_TIMEOUT_MINUTES, >> TimeUnit.MINUTES) >> .build(); > > What is the value of SOLR_TIMEOUT_MINUTES? This may not be super relevant, > but I am curious. For this test code I tried a few values of minutes to see if it would work after some time (it never worked). Usually on production code we use different timeouts all in the few seconds ranges. > > What is the Solr version on the server? Older Solr versions do not work well > with http2. The workaround for those issues is to use http1.1. It’s version 9.3.0 upgraded from 8.11.2 > > If the server side is also 9.3.0, then http2 should work well, and I do not > know what might be wrong. The exception with stacktrace might provide a clue. Could it be related to the fact that url is http and not https ? Is http2 client able to use h2c and switch to http2 ? Using CURL I can connect to the server without any issue (see below the log). Here’s the exception Exception in thread "main" org.apache.solr.client.solrj.SolrServerException: Timeout occurred while waiting response from server at: http://:8983/solr/up/admin/ping?wt=javabin&version=2 at org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:522) at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:234) at org.apache.solr.client.solrj.SolrClient.ping(SolrClient.java:911) at ecf3test.QuickTest.doWork(QuickTest.java:345) at ecf3test.QuickTest.main(QuickTest.java:20) Caused by: java.util.concurrent.TimeoutException at org.eclipse.jetty.client.util.InputStreamResponseListener.get(InputStreamResponseListener.java:214) at org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:512) ... 4 more Here’s the curl trace curl -v --http2 http://:8983/solr/up/admin/ping * Trying :8983... * Connected to () port 8983 (#0) > GET /solr/up/admin/ping HTTP/1.1 > Host: :8983 > User-Agent: curl/8.1.2 > Accept: */* > Connection: Upgrade, HTTP2-Settings > Upgrade: h2c > HTTP2-Settings: AAMAAABkAAQAoAIA > < HTTP/1.1 101 Switching Protocols < Upgrade: h2c < Connection: Upgrade * Received 101, Switching to HTTP/2 < HTTP/2 200 < content-security-policy: default-src 'none'; base-uri 'none'; connect-src 'self'; form-action 'self'; font-src 'self'; frame-ancestors 'none'; img-src 'self' data:; media-src 'self'; style-src 'self' 'unsafe-inline'; script-src 'self'; worker-src 'self'; < x-content-type-options: nosniff < x-frame-options: SAMEORIGIN < x-xss-protection: 1; mode=block < content-type: application/json;charset=utf-8 < vary: Accept-Encoding < content-length: 255 < { "responseHeader":{ "zkConnected":null, "status":0, "QTime":0, "params":{ "q":"1", "df":"key", "distrib":"false", "rows":"10", "echoParams":"all", "rid":"-138948" } }, "status":"OK" * Connection #0 to host left intact Thanks
Performance and number of fields per document
Hello, We’re using Solr for our e-commerce platform since many years and it always worked very well for over one million documents with a couple hundreds fields per document; we also do complex faceted searches and it works great. Now we’re trying to use Solr for another project that builds on the same data (so around one million documents) but adds many numeric fields that we want to retrieve and calculate stats on them (sums, averages, …). What we found is that it is manageable with 1000 added fields per document but It become unusable with 5000 added fields per document. Fields are a mix of tfloat and tint (20 dynamic fields that become 5000 when considering wildcard expansion), stored but not indexed. Core size on disk is around 15GB. We dedicated 6GB of heap to Solr; the server is a dual processor with several cores (I think 40 total) that are shared with another application but cpu usage is low. I’d like to know if there’s some configuration or best practice that we should care of to enhance performances in our case. Maybe it’s simply not advisable to use such many fields ? Note: for a test search that retrieves only 10 documents, qtime is very low (2 msec) but the full request time to get javabin or json data is very slow (several seconds). Thank you — Ing. Andrea Vettori Sistemi informativi
Re: SolrJ 9.3 timeout when using http2, works with http
On 9/3/23 13:36, Ing. Andrea Vettori wrote: For this test code I tried a few values of minutes to see if it would work after some time (it never worked). Usually on production code we use different timeouts all in the few seconds ranges. What is the Solr version on the server? Older Solr versions do not work well with http2. The workaround for those issues is to use http1.1. It’s version 9.3.0 upgraded from 8.11.2 Are you getting this problem on every connection attempt with SolrJ, or does it sometimes work? I wrote a little test program that creates a client (Http2SolrClient using http2 with timeouts specified) and uses it to do an all docs query: https://github.com/elyograg/test_vettori The README should cover everything you need, unless you're running on Windows. You can still run this on windows, but the steps would be different. I would be curious whether the program works against your Solr install. If it does, maybe there's some difference between what I did in that program and what you're doing. Thanks, Shawn