RE: [EXTERNAL] Re: Is this list alive? I need help

Beale, Jim (US-KOP) Wed, 28 Feb 2024 13:09:24 -0800

Here is the performance for this query on these nodes. You saw the code in a 
previous email.

http://samisolrcld.aws01.hibu.int:8983/solr/calls/select?indent=true&q.op=OR&fl=business_id,call_id,call_date,call_callerno,caller_name,dialog_merged&q=business_id%3A7016655681%20AND%20call_day:[20230101%20TO%2020240101}&group=true&group.field=call_callerno&sort=call_date%20desc&rows=10000&group.main=true<http://samisolrcld.aws01.hibu.int:8983/solr/calls/select?indent=true&q.op=OR&fl=business_id,call_id,call_date,call_callerno,caller_name,dialog_merged&q=business_id%3A7016655681%20AND%20call_day:%5b20230101%20TO%2020240101%7d&group=true&group.field=call_callerno&sort=call_date%20desc&rows=10000&group.main=true>

The two times given are right after a restart and the next day, or sometime a 
few hours later. The only difference is how Solr is running. I can’t understand 
what makes it run so slowly after a short while.

Business_id
Time 1
Time 2
7016274253
11.572
23.397
7010707194
21.941
21.414
7000001491
9.516
39.051
7029931968
10.755
59.196
7014676602
14.508
14.083
7004551760
12.873
36.856
7016274253
1.792
17.415
7010707194
5.671
25.442
7000001491
6.84
36.244
7029931968
6.291
38.483
7014676602
7.643
12.584
7004551760
5.669
21.977
7029931968
8.293
36.688
7008606979
16.976
30.569
7002264530
13.862
35.113
7017281920
10.1
31.914
7000001491
8.665
35.141
7058630709
11.236
38.104
7011363889
10.977
19.72
7016319075
15.763
26.023
7053262466
10.917
48.3
7000313815
9.786
24.617
7015187150
8.312
29.485
7016381845
11.51
34.545
7016379523
10.543
29.27
7026102159
6.047
30.381
7010707194
8.298
27.069
7016508018
7.98
34.48
7016280579
5.443
26.617
7016302809
3.491
12.578
7016259866
7.723
33.462
7016390730
11.358
32.997
7013498165
8.214
26.004
7016392929
6.612
19.711
7007737612
2.198
4.19
7012687678
8.627
35.342
7016606704
5.951
21.732
7007870203
2.524
16.534
7016268227
6.296
25.651
7016405011
3.288
18.541
7016424246
9.756
31.243
7000336592
5.465
31.486
7004696397
4.713
29.528
7016279283
2.473
24.243
7016623672
6.958
35.96
7016582537
5.112
33.475
7015713947
5.162
25.972
7003530665
8.223
26.549
7012825693
7.4
16.849
7010707194
6.781
23.835
7079272278
7.793
24.686

Jim Beale
Lead Software Engineer
hibu.com
2201 Renaissance Boulevard, King of Prussia, PA, 19406
Office: 610-879-3864
Mobile: 610-220-3067

[cid:image002.png@01DA6A5F.592FE780]

From: Beale, Jim (US-KOP) <jim.be...@hibu.com.INVALID>
Sent: Wednesday, February 28, 2024 3:29 PM
To: users@solr.apache.org
Subject: RE: [EXTERNAL] Re: Is this list alive? I need help

Caution!
Attachments and links (urls) can contain deceptive and/or malicious content.

I didn't see these responses because they were buried in my clutter folder.

We have 12,541,505 docs for calls, 9,144,862 form fills, 53,838 SMS and 12,752 
social leads. These are all a single Solr 9.1 cluster of three nodes with PROD 
and UAT all on a single server. As follows:

[cid:image003.png@01DA6A60.1D5538E0]

The three nodes are r5.xlarge and we’re not sure if those are large enough. The 
documents are not huge, from 1K to 25K each.

samisolrcld.aws01.hibu.int is a load-balancer

The request is

async function getCalls(businessId, limit) {

    const config = {

        method: 'GET',

        url: http://samisolrcld.aws01.hibu.int:8983/solr/calls/select,

        params: {

            q: `business_id:${businessId} AND call_day:[20230101 TO 20240101}`,

            fl: "business_id, call_id, call_day, call_date, dialog_merged, 
call_callerno, call_duration, call_status, caller_name, caller_address, 
caller_state, caller_city, caller_zip",

            rows: limit,

            start: 0,

            group: true,

            "group.main": true,

            "group.field": "call_callerno",

            sort: "call_day desc"

        }

    };

    //console.log(config);

    let rval = [];

    while(true) {

        try {

            //console.log(config.params.start);

            const rsp = await axios(config);

            if(rsp.data && rsp.data.response) {

                let docs = rsp.data.response.docs;

                if(docs.length == 0) break;

                config.params.start += limit;

                rval = rval.concat(docs);

            }

        } catch (err) {

            console.log("Error: " + err.message);

        }

    }

    return rval;

}

You wrote:

Note that EFS is encrypted file system, and stunnel is encrypted transport, so 
for each disk read you likely causing:

   - read raw encrypted data from disk to memory (at AWS)

   - decrypt the disk data in memory (at AWS)

   - encrypt the memory data for stunnel transport (at AWS)

   - send the data over the wire

   - decrypt the data for use by solr. (Hardware you specify)

That's guaranteed to be slow, and worse yet, you have no control at all over 
the size or loading of the hardware performing anything but the last step. You 
are completely at the mercy of AWS's cost/speed tradeoffs which are unlikely to 
be targeting the level of performance usually desired for search disk IO.

This is interesting. I can copy the data to local and try it from there.

Jim Beale

Lead Software Engineer

hibu.com

2201 Renaissance Boulevard, King of Prussia, PA, 19406

Office: 610-879-3864

Mobile: 610-220-3067

-----Original Message-----
From: Gus Heck <gus.h...@gmail.com<mailto:gus.h...@gmail.com>>
Sent: Sunday, February 25, 2024 9:15 AM
To: users@solr.apache.org<mailto:users@solr.apache.org>
Subject: [EXTERNAL] Re: Is this list alive? I need help

Caution!        Attachments and links (urls) can contain deceptive and/or 
malicious content.

Hi Jim,

Welcome to the Solr user list, not sure why your are asking about list 
liveliness? I don't see prior messages from you?

https://lists.apache.org/list?users@solr.apache.org:lte=1M:jim

Probably the most important thing you haven't told us is the current size of 
your indexes. You said 20k/day input, but at the start do you have 0days, 1 
day, 10 days, 100 days, 1000 days, or 10000 days (27y) on disk already?

If you are starting from zero, then there is likely a 20x or more growth in the 
size of the index between the first and second measurement.. indexes do get 
slower with size though you would need fantastically large documents or some 
sort of disk problem to explain it that way.

However, maybe you do have huge documents or disk issues since your query time 
at time1 is already abysmal? Either you are creating a fantastically expensive 
query, or your system is badly overloaded. New systems, properly sized with 
moderate sized documents ought to be serving simple queries in tens of 
milliseconds.

As others have said it is *critical you show us the entire query request*.

If you are doing something like attempting to return the entire index with 
rows=999999, that would almost certainly explain your issues...

How large are your average documents (in terms of bytes)?

Also what version of Solr?

r5.xlarge only has 4 cpu and 32 GB of memory. That's not very large (despite 
the name). However since it's unclear what your total index size looks like, it 
might be OK.

What are your IOPS constraints with EFS? Are you running out of a quota there? 
(bursting mode?)

Note that EFS is encrypted file system, and stunnel is encrypted transport, so 
for each disk read you likely causing:

   - read raw encrypted data from disk to memory (at AWS)

   - decrypt the disk data in memory (at AWS)

   - encrypt the memory data for stunnel transport (at AWS)

   - send the data over the wire

   - decrypt the data for use by solr. (Hardware you specify)

That's guaranteed to be slow, and worse yet, you have no control at all over 
the size or loading of the hardware performing anything but the last step. You 
are completely at the mercy of AWS's cost/speed tradeoffs which are unlikely to 
be targeting the level of performance usually desired for search disk IO.

I'll also echo others and say that it's a bad idea to allow solr instances to 
compete for disk IO in any way. I've seen people succeed with setups that use 
invisibly provisioned disks, but one typically has to run more hardware to 
compensate. Having a shared disk creates competition, and it also creates a 
single point of failure partially invalidating the notion of running 3 servers 
in cloud mode for high availability. If you can't have more than one disk, then 
you might as well run a single node, especially at small data sizes like 
20k/day.  A single node on well chosen hardware can usually serve tens of 
millions of normal sized documents, which would be several years of data for 
you. (assuming low query rates, handling high rates of course starts to require 
hardware)

Finally, you will want to get away from using single queries as a measurement 
of latency. If you care about response time I HIGHLY suggest you watch this 
YouTube video on how NOT to measure latency:

https://www.youtube.com/watch?v=lJ8ydIuPFeU

On Fri, Feb 23, 2024 at 6:44 PM Jan Høydahl 
<jan....@cominvent.com<mailto:jan....@cominvent.com>> wrote:

> I think EFS is a terribly slow file system to use for Solr, who

> recommended it? :) Better use one EBS per node.

> Not sure if the gradually slower performance is due to EFS though. We

> need to know more about your setup to get a clue. What role does

> stunnel play here? How are you indexing the content etc.

>

> Jan

>

> > 23. feb. 2024 kl. 19:58 skrev Walter Underwood 
> > <wun...@wunderwood.org<mailto:wun...@wunderwood.org>>:

> >

> > First, a shared disk is not a good idea. Each node should have its

> > own

> local disk. Solr makes heavy use of the disk.

> >

> > If the indexes are shared, I’m surprised it works at all. Solr is

> > not

> designed to share indexes.

> >

> > Please share the full query string.

> >

> > wunder

> > Walter Underwood

> > wun...@wunderwood.org<mailto:wun...@wunderwood.org>

> > http://observer.wunderwood.org/  (my blog)

> >

> >> On Feb 23, 2024, at 10:01 AM, Beale, Jim (US-KOP)

> <jim.be...@hibu.com.INVALID<mailto:jim.be...@hibu.com.INVALID>> wrote:

> >>

> >> I have a Solrcloud installation of three servers on three r5.xlarge

> >> EC2

> with a shared disk drive using EFS and stunnel.

> >>

> >> I have documents coming in about 20000 per day and I am trying to

> perform indexing along with some regular queries and some special

> queries for some new functionality.

> >>

> >> When I just restart Solr, these queries run very fast but over time

> become slower and slower.

> >>

> >> This is typical for the numbers. At time1, the request only took

> >> 2.16

> sec but over night the response took 18.137 sec. That is just typical.

> >>

> >> businessId, all count, reduced count, time1, time2

> >> 7016274253,8433,4769,2.162,18.137

> >>

> >> The same query is so far different. Overnight the Solr servers slow

> down and give terrible response. I don’t even know if this list is alive.

> >>

> >>

> >> Jim Beale

> >> Lead Software Engineer

> >> hibu.com

> >> 2201 Renaissance Boulevard, King of Prussia, PA, 19406

> >> Office: 610-879-3864

> >> Mobile: 610-220-3067

> >>

> >>

> >>

> >> The information contained in this email message, including any

> attachments, is intended solely for use by the individual or entity

> named above and may be confidential. If the reader of this message is

> not the intended recipient, you are hereby notified that you must not

> read, use, disclose, distribute or copy any part of this

> communication. If you have received this communication in error,

> please immediately notify me by email and destroy the original message, 
> including any attachments. Thank you.

> **Hibu IT Code:1414593000000**

> >

>

>

--

http://www.needhamsoftware.com (work)

https://a.co/d/b2sZLD9 (my fantasy fiction book)

RE: [EXTERNAL] Re: Is this list alive? I need help

Reply via email to