Re: dataimport problem

2023-09-03 Thread Mikhail Khludnev
Hi Scott

It should WARN into org.apache.solr.handler.dataimport.SolrWriter
Check that this log category is enabled, and check logs for it.
see
https://github.com/SearchScale/dataimporthandler/blob/branch_9x/src/main/java/org/apache/solr/handler/dataimport/SolrWriter.java#L82C7-L82C10

On Sun, Sep 3, 2023 at 12:44 AM Scott Derrick  wrote:

> This import has been running everyday for years.  Recently I noticed
> that 1 file was not being imported.
>
> When I run the commend I see the following
>
> {
>"responseHeader":{
>  "status":0,
>  "QTime":1},
>"initArgs":[
>  "defaults",[
>"config","tei-config.xml",
>"df","_text_"]],
>"command":"full-import",
>"status":"idle",
>"importResponse":"",
>"statusMessages":{
>  "Total Requests made to DataSource":"0",
>  "Total Rows Fetched":"68432",
>  "Total Documents Processed":"4887",
>  "Total Documents Skipped":"0",
>  "Full Dump Started":"2023-09-02 16:47:52",
>  "":"Indexing completed. Added/Updated: 4887 documents. Deleted 0
> documents.",
>  "Committed":"2023-09-02 16:51:41",
>  "Total Documents Failed":"1",
>  "Time taken":"0:3:49.353"}}
>
> the line "Total Documents Failed":"1" is the problem,
>
> I'm having a separate issue on a specific search return and I think it
> is related to this one document.  I've looked in the solr logs and don't
> see any reference to a failed document on import?
>
> How do I find out what document is failing on import?  I tried passing
> &debug=true
>
> curl
> '
> http://localhost:8983/solr/mbepp/update/tei?command=full-import&clean=true&debug=true
> '
>
> but then it only processes 10 documents and stops?
>
> very frustrating.
>
> Scott
>
>
>
>

-- 
Sincerely yours
Mikhail Khludnev


SolrJ 9.3 timeout when using http2, works with http

2023-09-03 Thread Ing. Andrea Vettori
Hello,
what can cause this issue ?

This times out

solrServer = new Http2SolrClient.Builder(solrUrl)
.withRequestTimeout(SOLR_TIMEOUT_MINUTES, 
TimeUnit.MINUTES)
.withConnectionTimeout(SOLR_TIMEOUT_MINUTES, 
TimeUnit.MINUTES)
.withIdleTimeout(SOLR_TIMEOUT_MINUTES, TimeUnit.MINUTES)
.build();

This works well

solrServer = new Http2SolrClient.Builder(solrUrl)
.useHttp1_1(true)
.withRequestTimeout(SOLR_TIMEOUT_MINUTES, 
TimeUnit.MINUTES)
.withConnectionTimeout(SOLR_TIMEOUT_MINUTES, 
TimeUnit.MINUTES)
.withIdleTimeout(SOLR_TIMEOUT_MINUTES, TimeUnit.MINUTES)
.build();

Just upgraded from 8.11 to 9.3 but always used older http client before.
Thanks

— 
Andrea Vettori





RE: Strategies for Real-Time Data Updates in Solr Without Compromising Latency

2023-09-03 Thread Ing. Andrea Vettori
Hello can you please explain the problem with a few numbers ? We’re using solr 
as the backend for our e-commerce platform and update it several times a day 
(around 4 times per hour) and don’t have any issue. 
It may depend on system size, concurrent searches etc… so if you have a few 
numbers it would be easier to give you some suggestions.

On 2023/08/25 18:16:02 Neeraj giri wrote:
> Greetings fellow forum members,
> 
> Our team is currently working with Solr 8.11 in cloud mode to power our
> search system, built using Java Spring at the application layer. We're
> facing a challenge in maintaining up-to-date pricing information for our
> ecommerce platform, which experiences frequent data changes throughout the
> day. While attempting to achieve real-time data updates, we've encountered
> issues related to Solr's latency and overall system performance.
> 
> As of now, we've implemented a process that halts data writes during the
> day. Instead, we retrieve updated pricing data from a separate microservice
> that maintains a cached and current version of the information. However, we
> believe this approach isn't ideal due to its potential impact on system
> efficiency.
> 
> We're seeking guidance on designing an architecture that can seamlessly
> handle real-time updates to our Solr index without compromising the search
> latency that our users expect. Writing directly to Solr nodes appears to
> increase read latency, which is a concern for us. Our goal is to strike a
> balance between keeping our pricing information up-to-date and maintaining
> an acceptable level of system responsiveness.
> 
> We would greatly appreciate any insights, strategies, or best practices
> from the community that can help us tackle this challenge. How can we
> optimize our approach to real-time data updates while ensuring Solr's
> latency remains within acceptable limits? Any advice or suggestions on
> architecture, techniques, or tools would be invaluable.
> 
> Thank you in advance for your expertise and assistance.
> 
> Regards,
> 
> Neeraj giri
>  

— 
Ing. Andrea Vettori
Sistemi Informativi
B2BIres s.r.l.



Re: Strategies for Real-Time Data Updates in Solr Without Compromising Latency

2023-09-03 Thread Shawn Heisey

On 8/25/23 12:16, Neeraj giri wrote:

Our team is currently working with Solr 8.11 in cloud mode to power our
search system, built using Java Spring at the application layer. We're
facing a challenge in maintaining up-to-date pricing information for our
ecommerce platform, which experiences frequent data changes throughout the
day. While attempting to achieve real-time data updates, we've encountered
issues related to Solr's latency and overall system performance.


What are the issues?  Be as detailed as you can be, include complete 
error messages.  Which exact Solr 8.11 version are you running?  There 
are three released versions that start with 8.11.



As of now, we've implemented a process that halts data writes during the
day. Instead, we retrieve updated pricing data from a separate microservice
that maintains a cached and current version of the information. However, we
believe this approach isn't ideal due to its potential impact on system
efficiency.


There's not much information here about what your microservice actually 
does or what problems you have had with it.



We're seeking guidance on designing an architecture that can seamlessly
handle real-time updates to our Solr index without compromising the search
latency that our users expect. Writing directly to Solr nodes appears to
increase read latency, which is a concern for us. Our goal is to strike a
balance between keeping our pricing information up-to-date and maintaining
an acceptable level of system responsiveness.


What exactly does "writing directly to Solr nodes" mean, and what is the 
alternative?


Your message includes very few details.  The details are needed to 
provide a solution.


Thanks,
Shawn



Re: SolrJ 9.3 timeout when using http2, works with http

2023-09-03 Thread Shawn Heisey

On 9/3/23 04:28, Ing. Andrea Vettori wrote:

what can cause this issue ?

This times out

 solrServer = new Http2SolrClient.Builder(solrUrl)
 .withRequestTimeout(SOLR_TIMEOUT_MINUTES, 
TimeUnit.MINUTES)
 .withConnectionTimeout(SOLR_TIMEOUT_MINUTES, 
TimeUnit.MINUTES)
 .withIdleTimeout(SOLR_TIMEOUT_MINUTES, 
TimeUnit.MINUTES)
 .build();


What is the value of SOLR_TIMEOUT_MINUTES?  This may not be super 
relevant, but I am curious.


The connection timeout should be something between 5 and 15 seconds. 
TCP connections establish pretty quickly, even if the destination is in 
a tiny island country with nothing but satellite Internet.  If you 
haven't gotten a connection established within 10-15 seconds, it's 
probably never going to connect.  If this connection is on a LAN, 5 
seconds would be an eternity.


I would suggest switching the timeouts to seconds and using more than 
one constant.


What is the precise error?  A Java exception can be dozens of lines in 
length ... be sure to include all of it.


What is the Solr version on the server?  Older Solr versions do not work 
well with http2.  The workaround for those issues is to use http1.1.


Here's a bug that MIGHT be applicable, fixed in Solr 9.2.0:

https://issues.apache.org/jira/browse/SOLR-16099

The underlying issue is one or more bugs in Jetty.  One of the MANY 
fixes in 9.2.0 was an upgraded Jetty version, both server and client.


If the server side is also 9.3.0, then http2 should work well, and I do 
not know what might be wrong.  The exception with stacktrace might 
provide a clue.


Thanks,
Shawn



Re: dataimport problem

2023-09-03 Thread Dmitri Maziuk

On 9/2/23 13:38, Scott Derrick wrote:

How do I find out what document is failing on import?



Presumably you have primary keys on both sides. Fetch them and compare 
the lists.


Dima




Re: SolrJ 9.3 timeout when using http2, works with http

2023-09-03 Thread Ing. Andrea Vettori
> On 3 Sep 2023, at 17:30, Shawn Heisey  wrote:
> 
> On 9/3/23 04:28, Ing. Andrea Vettori wrote:
>> what can cause this issue ?
>> This times out
>> solrServer = new Http2SolrClient.Builder(solrUrl)
>> .withRequestTimeout(SOLR_TIMEOUT_MINUTES, 
>> TimeUnit.MINUTES)
>> .withConnectionTimeout(SOLR_TIMEOUT_MINUTES, 
>> TimeUnit.MINUTES)
>> .withIdleTimeout(SOLR_TIMEOUT_MINUTES, 
>> TimeUnit.MINUTES)
>> .build();
> 
> What is the value of SOLR_TIMEOUT_MINUTES?  This may not be super relevant, 
> but I am curious.

For this test code I tried a few values of minutes to see if it would work 
after some time (it never worked). Usually on production code we use different 
timeouts all in the few seconds ranges.

> 
> What is the Solr version on the server?  Older Solr versions do not work well 
> with http2.  The workaround for those issues is to use http1.1.

It’s version 9.3.0 upgraded from 8.11.2

> 
> If the server side is also 9.3.0, then http2 should work well, and I do not 
> know what might be wrong.  The exception with stacktrace might provide a clue.

Could it be related to the fact that url is http and not https ? Is http2 
client able to use h2c and switch to http2 ? Using CURL I can connect to the 
server without any issue (see below the log).

Here’s the exception

Exception in thread "main" org.apache.solr.client.solrj.SolrServerException: 
Timeout occurred while waiting response from server at: 
http://:8983/solr/up/admin/ping?wt=javabin&version=2
at 
org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:522)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:234)
at org.apache.solr.client.solrj.SolrClient.ping(SolrClient.java:911)
at ecf3test.QuickTest.doWork(QuickTest.java:345)
at ecf3test.QuickTest.main(QuickTest.java:20)
Caused by: java.util.concurrent.TimeoutException
at 
org.eclipse.jetty.client.util.InputStreamResponseListener.get(InputStreamResponseListener.java:214)
at 
org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:512)
... 4 more

Here’s the curl trace

curl -v --http2 http://:8983/solr/up/admin/ping

*   Trying :8983...
* Connected to  () port 8983 (#0)
> GET /solr/up/admin/ping HTTP/1.1
> Host: :8983
> User-Agent: curl/8.1.2
> Accept: */*
> Connection: Upgrade, HTTP2-Settings
> Upgrade: h2c
> HTTP2-Settings: AAMAAABkAAQAoAIA
> 
< HTTP/1.1 101 Switching Protocols
< Upgrade: h2c
< Connection: Upgrade
* Received 101, Switching to HTTP/2
< HTTP/2 200 
< content-security-policy: default-src 'none'; base-uri 'none'; connect-src 
'self'; form-action 'self'; font-src 'self'; frame-ancestors 'none'; img-src 
'self' data:; media-src 'self'; style-src 'self' 'unsafe-inline'; script-src 
'self'; worker-src 'self';
< x-content-type-options: nosniff
< x-frame-options: SAMEORIGIN
< x-xss-protection: 1; mode=block
< content-type: application/json;charset=utf-8
< vary: Accept-Encoding
< content-length: 255
< 
{
  "responseHeader":{
"zkConnected":null,
"status":0,
"QTime":0,
"params":{
  "q":"1",
  "df":"key",
  "distrib":"false",
  "rows":"10",
  "echoParams":"all",
  "rid":"-138948"
}
  },
  "status":"OK"
* Connection #0 to host  left intact

Thanks

Performance and number of fields per document

2023-09-03 Thread Ing. Andrea vettori
Hello,
We’re using Solr for our e-commerce platform since many years and it always 
worked very well for over one million documents with a couple hundreds fields 
per document; we also do complex faceted searches and it works great. 

Now we’re trying to use Solr for another project that builds on the same data 
(so around one million documents) but adds many numeric fields that we want to 
retrieve and calculate stats on them (sums, averages, …).  
What we found is that it is manageable with 1000 added fields per document but 
It become unusable with 5000 added fields per document. 

Fields are a mix of tfloat and tint (20 dynamic fields that become 5000 when 
considering wildcard expansion), stored but not indexed. 

Core size on disk is around 15GB. We dedicated 6GB of heap to Solr; the server 
is a dual processor with several cores (I think 40 total) that are shared with 
another application but cpu usage is low. 

I’d like to know if there’s some configuration or best practice that we should 
care of to enhance performances in our case. 
Maybe it’s simply not advisable to use such many fields ?

Note: for a test search that retrieves only 10 documents, qtime is very low (2 
msec) but the full request time to get javabin or json data is very slow 
(several seconds). 

Thank you 

—
Ing. Andrea Vettori
Sistemi informativi


Re: SolrJ 9.3 timeout when using http2, works with http

2023-09-03 Thread Shawn Heisey

On 9/3/23 13:36, Ing. Andrea Vettori wrote:

For this test code I tried a few values of minutes to see if it would work 
after some time (it never worked). Usually on production code we use different 
timeouts all in the few seconds ranges.



What is the Solr version on the server?  Older Solr versions do not work well 
with http2.  The workaround for those issues is to use http1.1.


It’s version 9.3.0 upgraded from 8.11.2


Are you getting this problem on every connection attempt with SolrJ, or 
does it sometimes work?


I wrote a little test program that creates a client (Http2SolrClient 
using http2 with timeouts specified) and uses it to do an all docs query:


https://github.com/elyograg/test_vettori

The README should cover everything you need, unless you're running on 
Windows.  You can still run this on windows, but the steps would be 
different.


I would be curious whether the program works against your Solr install. 
If it does, maybe there's some difference between what I did in that 
program and what you're doing.


Thanks,
Shawn