date:20230804

Re: Add a new Shard to the collection

2023-08-04 Thread HariBabu kuruva

Thanks for the input Ilan.





On Thu, Aug 3, 2023 at 5:25 PM Ilan Ginzburg  wrote:

> I don't think adding shards (even from 1 to 2) is the solution.
> You need enough replicas so all your nodes share the load, but with such
> small shards you likely don't need more than 1.
> If your nodes are saturated by traffic, you need more nodes (and more
> replicas so that the added nodes have a replica as well).
>
> Ilan
>
> On Thu, Aug 3, 2023, 8:23 AM HariBabu kuruva 
> wrote:
>
> > Hi Ilan,
> >
> > Thank you for your reply.
> >
> > Application requests are facing connection failures a couple of times. So
> > our DEV team requested to add more shards as they are expecting more read
> > heavy queries in the future.
> >
> > Initially they requested two shards and now they are asking for one more
> > shard.(3 shards). We have a total of 6 solr nodes available.
> >
> > The disk sizes consumed by the currently created  two shards are around
> 2.5
> > GB each.
> >
> > Please let me know if any other information is required.
> >
> >
> >
> >
> >
> > On Wed, Aug 2, 2023 at 11:29 PM Ilan Ginzburg 
> wrote:
> >
> > > Well, if the size of the two shards you now have is equivalent, you
> will
> > > not be able to get to 3 balanced (in size) shards.
> > >
> > > If one of the two seems to get more data (is larger), split that one.
> > This
> > > might be the case if you use fancy routing for deciding which doc goes
> > > where.
> > >
> > > Otherwise, to get to 3 similarly sized shards you need to explicitly
> > > specify the ranges during the split.
> > > Either create one subshard with twice the range of the other so you can
> > > split the larger one into two and end up with 3 similarly sized shards,
> > or
> > > split the initial shard into 3 subshards in one go (I've never tried
> > > splitting into more that 2 shards though, so I end up with a power of 2
> > > number of balanced shards, assuming uniform distribution of docs into
> the
> > > hash range).
> > >
> > > But I assume your real goal is not having a specific number of shards.
> > > What issues are you running into in your current setup that you're
> trying
> > > to address?
> > > You mentioned "better performance" but performance of what? Query?
> > > Indexing? Are you running out of memory? CPU? Are you adding nodes
> > > (servers) and/or replicas as you're increasing the number of shards?
> > >
> > > What has improved as you moved from one to two shards? Why decide then
> > that
> > > you then want to have 3 shards and no stay at 2 or move to 4?
> > >
> > > Ilan
> > >
> > > On Wed, Aug 2, 2023, 5:48 PM HariBabu kuruva <
> hari2708.kur...@gmail.com>
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I did sharding, splitted shard1 into shard-1_0 and shard-1_1
> > > > I want to have one more shard(3 shards). In this case, which shard
> > > should I
> > > > split . Please advise.
> > > >
> > > >
> > > > On Tue, Aug 1, 2023 at 11:17 AM HariBabu kuruva <
> > > hari2708.kur...@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > ++ FYI, I can see the old shard automatically removed.
> > > > >
> > > > > On Mon, Jul 31, 2023 at 11:39 AM HariBabu kuruva <
> > > > > hari2708.kur...@gmail.com> wrote:
> > > > >
> > > > >> Thanks for your reply.
> > > > >>
> > > > >> I am a little bit worried about PROD. Can I go ahead and do the
> same
> > > > >> steps in PROD ? Do I need to take any backups or any steps before
> > > > >> doing this?
> > > > >>
> > > > >> On Sat, Jul 29, 2023 at 8:51 AM Mikhail Khludnev  >
> > > > wrote:
> > > > >>
> > > > >>> Hello Hari.
> > > > >>> If new shards are handling queries and updates well it's ok to
> have
> > > old
> > > > >>> shard inactive.
> > > > >>> You can request DELETESHARD to reclaim the disk space.
> > > > >>>
> > > > >>> On Mon, Jul 24, 2023 at 6:19 PM HariBabu kuruva <
> > > > >>> hari2708.kur...@gmail.com>
> > > > >>> wrote:
> > > > >>>
> > > > >>> > Hi All,
> > > > >>> >
> > > > >>> > I would like to add a new shard to the existing collection to
> > have
> > > > >>> better
> > > > >>> > performance.  Currently we have one shard.
> > > > >>> >
> > > > >>> > Solr - 8.11.1
> > > > >>> > Nodes(servers) - 10 (Non prod - 4 nodes)
> > > > >>> > Zookeepers-5
> > > > >>> >
> > > > >>> > I have tried the SPLITSHARD command in one of the non prod
> > > > >>> environments.
> > > > >>> >
> > > > >>> > *
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> https://solrserver.corp.company.com:8981/solr/admin/collections?action=SPLITSHARD&collection=abcStore&shard=shard1
> > > > >>> > <
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> https://solrserver.corp.company.com:8981/solr/admin/collections?action=SPLITSHARD&collection=abcStore&shard=shard1
> > > > >>> > >*
> > > > >>> > Now i can see total 3 shards
> > > > >>> > Shard1
> > > > >>> > Shard1_0
> > > > >>> > Shard1_1
> > > > >>> >
> > > > >>> > But Shard1 is shown as inactive. Please let me know if we need
> to
> > > > >>> remove
> > > > >>> > this ?
> > > > >>> >
> > > > >>> > Please help me if th

Re: Add a new Shard to the collection

2023-08-04 Thread HariBabu kuruva

Thank you uyilmaz for the detailed explanation.

On Thu, Aug 3, 2023 at 6:21 PM ufuk yılmaz 
wrote:

> My two cents, it took me some time to understand when to add shards or
> replicas when I first started using Solr,
>
> Speed of a single isolated query when system is idle -VS- total throughput
> of the system when many queries are executing
>
> Sharding divides data into smaller pieces and puts each piece(shard) on a
> separate computer. Now for a single incoming query, all of those nodes
> needs to be searched  because it has no idea which piece(shard) may contain
> the searched documents. If your data was very big before sharding, this
> improves response time of a single isolated query. If your data wasn’t big
> to start with, this *may* even hurt performance because more network trips.
> (Big = index size is larger than available RAM amount on the machine)
>
> Adding replicas creates a copy of your data and puts that copy on a
> separate computer. Now for a single incoming query, still, a single node is
> selected and queried. Single isolated query doesn’t get faster, but overall
> throughput is increased (system can serve 2x more  queries per second
> before coughing).
>
> There are many “but if”s and different cases with this line of thought,
> but I hope it explains the main idea to someone new to Solr?
>
> A confusing thing for someone new is the relation between collection,
> replica and shard. A collection has one-to-many relationship with shard,
> and a shard has one-to-many relation with replica
>
> Hope this is useful
>
> Sent from Mail for Windows
>
> From: Ilan Ginzburg
> Sent: Thursday, August 3, 2023 2:55 PM
> To: users@solr.apache.org
> Subject: Re: Add a new Shard to the collection
>
> I don't think adding shards (even from 1 to 2) is the solution.
> You need enough replicas so all your nodes share the load, but with such
> small shards you likely don't need more than 1.
> If your nodes are saturated by traffic, you need more nodes (and more
> replicas so that the added nodes have a replica as well).
>
> Ilan
>
> On Thu, Aug 3, 2023, 8:23 AM HariBabu kuruva 
> wrote:
>
> > Hi Ilan,
> >
> > Thank you for your reply.
> >
> > Application requests are facing connection failures a couple of times. So
> > our DEV team requested to add more shards as they are expecting more read
> > heavy queries in the future.
> >
> > Initially they requested two shards and now they are asking for one more
> > shard.(3 shards). We have a total of 6 solr nodes available.
> >
> > The disk sizes consumed by the currently created  two shards are around
> 2.5
> > GB each.
> >
> > Please let me know if any other information is required.
> >
> >
> >
> >
> >
> > On Wed, Aug 2, 2023 at 11:29 PM Ilan Ginzburg 
> wrote:
> >
> > > Well, if the size of the two shards you now have is equivalent, you
> will
> > > not be able to get to 3 balanced (in size) shards.
> > >
> > > If one of the two seems to get more data (is larger), split that one.
> > This
> > > might be the case if you use fancy routing for deciding which doc goes
> > > where.
> > >
> > > Otherwise, to get to 3 similarly sized shards you need to explicitly
> > > specify the ranges during the split.
> > > Either create one subshard with twice the range of the other so you can
> > > split the larger one into two and end up with 3 similarly sized shards,
> > or
> > > split the initial shard into 3 subshards in one go (I've never tried
> > > splitting into more that 2 shards though, so I end up with a power of 2
> > > number of balanced shards, assuming uniform distribution of docs into
> the
> > > hash range).
> > >
> > > But I assume your real goal is not having a specific number of shards.
> > > What issues are you running into in your current setup that you're
> trying
> > > to address?
> > > You mentioned "better performance" but performance of what? Query?
> > > Indexing? Are you running out of memory? CPU? Are you adding nodes
> > > (servers) and/or replicas as you're increasing the number of shards?
> > >
> > > What has improved as you moved from one to two shards? Why decide then
> > that
> > > you then want to have 3 shards and no stay at 2 or move to 4?
> > >
> > > Ilan
> > >
> > > On Wed, Aug 2, 2023, 5:48 PM HariBabu kuruva <
> hari2708.kur...@gmail.com>
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I did sharding, splitted shard1 into shard-1_0 and shard-1_1
> > > > I want to have one more shard(3 shards). In this case, which shard
> > > should I
> > > > split . Please advise.
> > > >
> > > >
> > > > On Tue, Aug 1, 2023 at 11:17 AM HariBabu kuruva <
> > > hari2708.kur...@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > ++ FYI, I can see the old shard automatically removed.
> > > > >
> > > > > On Mon, Jul 31, 2023 at 11:39 AM HariBabu kuruva <
> > > > > hari2708.kur...@gmail.com> wrote:
> > > > >
> > > > >> Thanks for your reply.
> > > > >>
> > > > >> I am a little bit worried about PROD. Can I go ahead and do the
> same
> > > > >> steps in

Changing Solr collection's DirectoryFactory

2023-08-04 Thread Jayesh Shende

Hi everyone,

Using: Solr 8.11.2 with rhel9

Currently using "solr.NRTCachingDirectoryFactory" for a collection,
the collection has grown big in size, but don't want to add more RAM to
machine(AWS),
I can increase IOPS and througput for data volume.

Was thinking of using "solr.NIOFSDirectoryFactory",
but wanted to know, how will it impact to existing collection?
May be it is just a way to read index files,  but to be sure, will it
affect my existing indexed data?

Any light on this will be helpful.


Thanks.



Jayesh Shende

Re: knn parser not working as expected

2023-08-04 Thread gnandre

Also, here is the debug output for that workaround with fq I mentioned.
This debug output is not big.

  "debug":{
"rawquerystring":"*:*",
"querystring":"*:*",
"parsedquery":"+(+MatchAllDocsQuery(*:*)) ()",
"parsedquery_toString":"+(+*:*) ()",
"json":{"params":{
"q":"*:*",
"fq":"{!knn f=dense_vector
topK=1}[0.06525743007659912,0.015727980062365532,0.003069591475650668,-0.016254400834441185,0.003478930564597249,-0.02475954219698906,0.020238326862454414,0.010255611501634121,0.05522076040506363,0.020635411143302917,0.05825875699520111,-0.05110647529363632,-0.04696913808584213,0.05991407483816147,-0.0003015052934642881,0.03625837340950966,-0.044656239449977875,-0.06582673639059067,-0.06842341274023056,-0.022927379235625267,0.048230838030576706,-0.12659960985183716,-0.019311215728521347,-0.04432906210422516,0.03600681200623512,0.010301047936081886,0.08415472507476807,0.04727723449468613,-0.0584205724298954,-0.045265913009643555,0.012285877950489521,0.0034233061596751213,-0.00982636958360672,-0.013216182589530945,-0.038882751017808914,-0.05872005969285965,-0.029350444674491882,0.04930287227034569,0.0022274062503129244,0.01728842593729496,-0.08762819767,-0.045831114053726196,0.072530098259449,0.03804686293005943,0.0021682181395590305,-0.05424166098237038,-0.004494055639952421,0.05843663960695267,0.058729417622089386,0.016252348199486732,0.0019551776349544525,-0.012190568260848522,-0.08235936611890793,-0.003848800901323557,0.028969185426831245,0.047798849642276764,-0.04074695333838463,-0.10175333172082901,0.06699151545763016,-0.06788542866706848,-0.01607389748096466,0.07294511049985886,0.007754810154438019,0.039606861770153046,0.07451225817203522,-0.02967959391212,0.014015864580869675,0.08055979013442993,0.0010412412229925394,0.13284511864185333,-0.013288799673318863,-0.05446619912981987,-0.03510258346796036,-0.12459734082221985,-0.017629574984312057,-0.04287091642618179,-0.019087448716163635,0.027409998700022697,-0.040427371859550476,-0.1713477075099945,-0.0035959691740572453,0.01750982739031315,-0.06452985852956772,0.10622204840183258,-0.06865541636943817,0.06022517383098602,0.03378240391612053,0.02320132404565811,0.02072194404900074,0.03390982002019882,0.0051648980006575584,0.05843415856361389,-0.07012602686882019,0.046549294143915176,0.005304296966642141,0.09183698892593384,0.060101959854364395,-0.031673040241003036,0.03126641735434532,0.10213921219110489,0.07624002546072006,-0.09995660930871964,0.03316718339920044,-0.040208760648965836,-0.016963355243206024,-0.01603076048195362,-0.00566966412588954,0.0570228286087513,0.006566803902387619,0.028397461399435997,-0.03737075999379158,-0.03357473015785217,-0.05060608312487602,0.0882791057229042,0.14182551205158234,0.01651209406554699,0.047577112913131714,-0.028357332572340965,-0.12397051602602005,0.03264006972312927,0.030581200495362282,0.025287700816988945,-0.08509892970323563,0.032361947267,-0.06732083112001419,0.0193667970597744,0.07096285372972488,-5.732041797079612e-33,0.033934514969587326,0.029480531811714172,-0.024119360372424126,0.03248802572488785,0.060654137283563614,-0.04089922457933426,-0.06845896691083908,0.015865417197346687,-0.03816983848810196,0.12768638134002686,-0.047979939728975296,0.01888129487633705,0.01966758444905281,-0.021792754530906677,-0.00209379056468606,-0.060791824012994766,0.07595516741275787,-0.05137578397989273,-0.020345840603113174,0.02730456180870533,-0.08421282470226288,0.0052170781418681145,-0.0396740548312664,0.013655638322234154,0.043763574212789536,0.0368662029504776,-0.021710995584726334,0.03603581339120865,0.04991370812058449,-0.007524373475462198,0.033250145614147186,0.0669487863779068,-0.012807670049369335,-0.08904062211513519,-0.04803512617945671,-0.0461772084236145,0.018098553642630577,0.01096352282911539,0.0617918036878109,0.014066621661186218,-0.03305654972791672,-0.08129353821277618,-0.025270603597164154,0.03537251427769661,0.06029881164431572,0.06169535592198372,0.0355769582092762,0.03534447401762009,-0.047377053648233414,0.053076375275850296,-0.019250469282269478,-0.03837420791387558,-0.00834209006279707,0.031550273299217224,0.004682184662669897,0.0590718574821949,0.0326957181096077,-0.041941817849874496,-0.04179370403289795,-0.010403091087937355,0.11914990842342377,-0.049126915633678436,0.015761952847242355,-0.012162514962255955,-0.05942496284842491,0.04794146493077278,-0.06834675371646881,-0.03294386342167854,0.02242257259786129,0.0774146020412445,-0.1095564718246,0.023828692734241486,0.054935190826654434,0.0202674251049757,-0.057155776768922806,-0.009578827768564224,-0.051850661635398865,0.09117215871810913,-0.07315851002931595,-0.0019339871359989047,-0.05835318937897682,-0.058747921139001846,-0.05519327148795128,-0.014699703082442284,-0.0020833320450037718,-0.05721793323755264,0.055632084608078,0.006448595318943262,0.0034963993821293116,-0.031087594106793404,-0.09541762620210648,0.03679275885224342,-0.012651922181248665,-0.038976479321718216,-0.013171667233109

Re: Fwd:

2023-08-04 Thread Shawn Heisey


On 8/3/23 22:45, Ayana Joby wrote:

Hello Team,
We are using following configuration for Japanese language, but synonym 
search is not working using this configuration for japanese language


Only one of your attachments made it through to the list.  But I have 
seen them in Jira.


In Jira, I mentioned you were using the analysis tab incorrectly.  You 
have entered the same value for both index and query.  You will need to 
put the query string in the right-hand box and put the actual indexed 
data in the left-hand box.  Don't copy it from the synonyms file, copy 
it from the actual document.  Copy the entire field contents, not just 
the string you're hoping to match.


For reference, the Jira issue:

https://issues.apache.org/jira/browse/SOLR-16914

Thanks,
Shawn

Re: Changing Solr collection's DirectoryFactory

2023-08-04 Thread Shawn Heisey


On 8/4/23 09:56, Jayesh Shende wrote:

Using: Solr 8.11.2 with rhel9

Currently using "solr.NRTCachingDirectoryFactory" for a collection,
the collection has grown big in size, but don't want to add more RAM to
machine(AWS),
I can increase IOPS and througput for data volume.

Was thinking of using "solr.NIOFSDirectoryFactory",
but wanted to know, how will it impact to existing collection?
May be it is just a way to read index files,  but to be sure, will it
affect my existing indexed data?


It's generally not a good idea to explicitly configure the directory 
factory.  That should only be done in very unusual circumstances.  Your 
situation probably does not qualify.


Remove any config for that and let Solr/Lucene pick the class that's 
best for the environment.  It will probably choose 
NRTCachingDirectoryFactory.  If a better option becomes available in a 
newer Solr version, it will most likely be automatically chosen as long 
as the value isn't explicitly configured.


Looking at the source, I cannot tell for sure whether NOIFS uses mmap, 
but I suspect it does not.  For nearly all use cases, you want a 
directory implementation that uses mmap, which the NRTCaching 
implementation does.


Changing the directory factory is very unlikely to cause any problems 
with the existing index.  But I am curious why you want to change 
that... what have you encountered and why do you think you should go 
with a non-default class?


If you have enough memory installed, the disk speed will have very 
little impact on performance.  Disk performance only becomes important 
in situations where you do not have enough spare memory for effective 
disk caching.  Memory is faster than disk, even if the disk is extremely 
fast SSD.


A directory implementation that uses mmap will be the fastest option.

Thanks,
Shawn

Re: Changing Solr collection's DirectoryFactory

2023-08-04 Thread Jayesh Shende

Hi Shawn,

Thanks for responding so quickly.

The server box is shared by multiple Solr nodes, each node is having more
than 100gb of disk usage (~2-4 replicas of different collections on one
Solr).

The NRTCachingDirectoryFactory is trying to cache as much segments as
possible into the memory, but the queries are for different collections and
are varying (less of repetitive query terms), so thinking this cached
segments are not actually very useful here, and RAM (apart from JVM
assigned) is not enough to cache even 10% of the index, for each Solr node
running.

Also it is an existing Solr, trying to improve performance, and as we know
NIO is better than IO in java and I can increase IOPS and throughput for
disk, so was gathering how will it affect?

Before changing anything will try removing the explicit configuration for
directoryFactory to see how it works/how it picks the best for underlying
OS. *As this should not affect the underlying  indexed data. for the
collectios.

Thanks.

Jayesh Shende

On Fri, 4 Aug 2023, 22:35 Shawn Heisey,  wrote:

> On 8/4/23 09:56, Jayesh Shende wrote:
> > Using: Solr 8.11.2 with rhel9
> >
> > Currently using "solr.NRTCachingDirectoryFactory" for a collection,
> > the collection has grown big in size, but don't want to add more RAM to
> > machine(AWS),
> > I can increase IOPS and througput for data volume.
> >
> > Was thinking of using "solr.NIOFSDirectoryFactory",
> > but wanted to know, how will it impact to existing collection?
> > May be it is just a way to read index files,  but to be sure, will it
> > affect my existing indexed data?
>
> It's generally not a good idea to explicitly configure the directory
> factory.  That should only be done in very unusual circumstances.  Your
> situation probably does not qualify.
>
> Remove any config for that and let Solr/Lucene pick the class that's
> best for the environment.  It will probably choose
> NRTCachingDirectoryFactory.  If a better option becomes available in a
> newer Solr version, it will most likely be automatically chosen as long
> as the value isn't explicitly configured.
>
> Looking at the source, I cannot tell for sure whether NOIFS uses mmap,
> but I suspect it does not.  For nearly all use cases, you want a
> directory implementation that uses mmap, which the NRTCaching
> implementation does.
>
> Changing the directory factory is very unlikely to cause any problems
> with the existing index.  But I am curious why you want to change
> that... what have you encountered and why do you think you should go
> with a non-default class?
>
> If you have enough memory installed, the disk speed will have very
> little impact on performance.  Disk performance only becomes important
> in situations where you do not have enough spare memory for effective
> disk caching.  Memory is faster than disk, even if the disk is extremely
> fast SSD.
>
> A directory implementation that uses mmap will be the fastest option.
>
> Thanks,
> Shawn
>

RE: Changing Solr collection's DirectoryFactory

2023-08-04 Thread ufuk yılmaz

I was in a similar situation, our index was way too big compared to the RAM on 
the nodes. I was seeing constant %100 disk read, query timeouts and dead nodes 
because the default directory reader (nrtcaching) was trying to cache a 
different part of the index in memory for every other request but queries were 
rarely against the same collection. Our disks could do 1gb per second read but 
a single simple query would cause 40 seconds of constant reading to return just 
a few documents. Time to time solr went completely unresponsive until it could 
finish doing disk reads for previous requests. 

I switched to NIOFS reader and disk problem was solved. Just don’t expect Solr 
to be super fast as it was with a small index which could fit in RAM.

-ufuk

Sent from Mail for Windows

From: Jayesh Shende
Sent: Friday, August 4, 2023 8:44 PM
To: users@solr.apache.org
Subject: Re: Changing Solr collection's DirectoryFactory

Hi Shawn,

Thanks for responding so quickly.

The server box is shared by multiple Solr nodes, each node is having more
than 100gb of disk usage (~2-4 replicas of different collections on one
Solr).

The NRTCachingDirectoryFactory is trying to cache as much segments as
possible into the memory, but the queries are for different collections and
are varying (less of repetitive query terms), so thinking this cached
segments are not actually very useful here, and RAM (apart from JVM
assigned) is not enough to cache even 10% of the index, for each Solr node
running.

Also it is an existing Solr, trying to improve performance, and as we know
NIO is better than IO in java and I can increase IOPS and throughput for
disk, so was gathering how will it affect?

Before changing anything will try removing the explicit configuration for
directoryFactory to see how it works/how it picks the best for underlying
OS. *As this should not affect the underlying  indexed data. for the
collectios.

Thanks.

Jayesh Shende

On Fri, 4 Aug 2023, 22:35 Shawn Heisey,  wrote:

> On 8/4/23 09:56, Jayesh Shende wrote:
> > Using: Solr 8.11.2 with rhel9
> >
> > Currently using "solr.NRTCachingDirectoryFactory" for a collection,
> > the collection has grown big in size, but don't want to add more RAM to
> > machine(AWS),
> > I can increase IOPS and througput for data volume.
> >
> > Was thinking of using "solr.NIOFSDirectoryFactory",
> > but wanted to know, how will it impact to existing collection?
> > May be it is just a way to read index files,  but to be sure, will it
> > affect my existing indexed data?
>
> It's generally not a good idea to explicitly configure the directory
> factory.  That should only be done in very unusual circumstances.  Your
> situation probably does not qualify.
>
> Remove any config for that and let Solr/Lucene pick the class that's
> best for the environment.  It will probably choose
> NRTCachingDirectoryFactory.  If a better option becomes available in a
> newer Solr version, it will most likely be automatically chosen as long
> as the value isn't explicitly configured.
>
> Looking at the source, I cannot tell for sure whether NOIFS uses mmap,
> but I suspect it does not.  For nearly all use cases, you want a
> directory implementation that uses mmap, which the NRTCaching
> implementation does.
>
> Changing the directory factory is very unlikely to cause any problems
> with the existing index.  But I am curious why you want to change
> that... what have you encountered and why do you think you should go
> with a non-default class?
>
> If you have enough memory installed, the disk speed will have very
> little impact on performance.  Disk performance only becomes important
> in situations where you do not have enough spare memory for effective
> disk caching.  Memory is faster than disk, even if the disk is extremely
> fast SSD.
>
> A directory implementation that uses mmap will be the fastest option.
>
> Thanks,
> Shawn
>

Re: Changing Solr collection's DirectoryFactory

2023-08-04 Thread Shawn Heisey


On 8/4/23 11:43, Jayesh Shende wrote:

The NRTCachingDirectoryFactory is trying to cache as much segments as
possible into the memory, but the queries are for different collections and
are varying (less of repetitive query terms), so thinking this cached
segments are not actually very useful here, and RAM (apart from JVM
assigned) is not enough to cache even 10% of the index, for each Solr node
running.


Solr does NOT proactively cache data from the index files.  It leaves 
that to the operating system.


If a certain piece of data is never accessed, it will never end up in 
the cache.  Same is true of the on-heap caches that Solr generates and 
maintains ... data that is never accessed will NOT be cached.



Also it is an existing Solr, trying to improve performance, and as we know
NIO is better than IO in java and I can increase IOPS and throughput for
disk, so was gathering how will it affect?


When resources are properly sized, MMAP is the most efficient option for 
accessing file data available on ANY operating system.


The best way to improve Solr performance is to add memory.  But if 
you're in a cloud-based setup, that is quite expensive.


From the other response, I gather that for memory-starved setups, 
switching the directory implementation can improve performance.  But I 
offer this prediction:  If you continue to run without sufficient 
memory, you're eventually going to hit a performance wall that can only 
be fixed by adding memory.


Thanks,
Shawn

Re: Add a new Shard to the collection

Re: Add a new Shard to the collection

Changing Solr collection's DirectoryFactory

Re: knn parser not working as expected

Re: Fwd:

Re: Changing Solr collection's DirectoryFactory

Re: Changing Solr collection's DirectoryFactory

RE: Changing Solr collection's DirectoryFactory

Re: Changing Solr collection's DirectoryFactory

9 matches

Site Navigation

Mail list logo

Footer information