Re: Add a new Shard to the collection
Thanks for the input Ilan. On Thu, Aug 3, 2023 at 5:25 PM Ilan Ginzburg wrote: > I don't think adding shards (even from 1 to 2) is the solution. > You need enough replicas so all your nodes share the load, but with such > small shards you likely don't need more than 1. > If your nodes are saturated by traffic, you need more nodes (and more > replicas so that the added nodes have a replica as well). > > Ilan > > On Thu, Aug 3, 2023, 8:23 AM HariBabu kuruva > wrote: > > > Hi Ilan, > > > > Thank you for your reply. > > > > Application requests are facing connection failures a couple of times. So > > our DEV team requested to add more shards as they are expecting more read > > heavy queries in the future. > > > > Initially they requested two shards and now they are asking for one more > > shard.(3 shards). We have a total of 6 solr nodes available. > > > > The disk sizes consumed by the currently created two shards are around > 2.5 > > GB each. > > > > Please let me know if any other information is required. > > > > > > > > > > > > On Wed, Aug 2, 2023 at 11:29 PM Ilan Ginzburg > wrote: > > > > > Well, if the size of the two shards you now have is equivalent, you > will > > > not be able to get to 3 balanced (in size) shards. > > > > > > If one of the two seems to get more data (is larger), split that one. > > This > > > might be the case if you use fancy routing for deciding which doc goes > > > where. > > > > > > Otherwise, to get to 3 similarly sized shards you need to explicitly > > > specify the ranges during the split. > > > Either create one subshard with twice the range of the other so you can > > > split the larger one into two and end up with 3 similarly sized shards, > > or > > > split the initial shard into 3 subshards in one go (I've never tried > > > splitting into more that 2 shards though, so I end up with a power of 2 > > > number of balanced shards, assuming uniform distribution of docs into > the > > > hash range). > > > > > > But I assume your real goal is not having a specific number of shards. > > > What issues are you running into in your current setup that you're > trying > > > to address? > > > You mentioned "better performance" but performance of what? Query? > > > Indexing? Are you running out of memory? CPU? Are you adding nodes > > > (servers) and/or replicas as you're increasing the number of shards? > > > > > > What has improved as you moved from one to two shards? Why decide then > > that > > > you then want to have 3 shards and no stay at 2 or move to 4? > > > > > > Ilan > > > > > > On Wed, Aug 2, 2023, 5:48 PM HariBabu kuruva < > hari2708.kur...@gmail.com> > > > wrote: > > > > > > > Hi All, > > > > > > > > I did sharding, splitted shard1 into shard-1_0 and shard-1_1 > > > > I want to have one more shard(3 shards). In this case, which shard > > > should I > > > > split . Please advise. > > > > > > > > > > > > On Tue, Aug 1, 2023 at 11:17 AM HariBabu kuruva < > > > hari2708.kur...@gmail.com > > > > > > > > > wrote: > > > > > > > > > ++ FYI, I can see the old shard automatically removed. > > > > > > > > > > On Mon, Jul 31, 2023 at 11:39 AM HariBabu kuruva < > > > > > hari2708.kur...@gmail.com> wrote: > > > > > > > > > >> Thanks for your reply. > > > > >> > > > > >> I am a little bit worried about PROD. Can I go ahead and do the > same > > > > >> steps in PROD ? Do I need to take any backups or any steps before > > > > >> doing this? > > > > >> > > > > >> On Sat, Jul 29, 2023 at 8:51 AM Mikhail Khludnev > > > > > wrote: > > > > >> > > > > >>> Hello Hari. > > > > >>> If new shards are handling queries and updates well it's ok to > have > > > old > > > > >>> shard inactive. > > > > >>> You can request DELETESHARD to reclaim the disk space. > > > > >>> > > > > >>> On Mon, Jul 24, 2023 at 6:19 PM HariBabu kuruva < > > > > >>> hari2708.kur...@gmail.com> > > > > >>> wrote: > > > > >>> > > > > >>> > Hi All, > > > > >>> > > > > > >>> > I would like to add a new shard to the existing collection to > > have > > > > >>> better > > > > >>> > performance. Currently we have one shard. > > > > >>> > > > > > >>> > Solr - 8.11.1 > > > > >>> > Nodes(servers) - 10 (Non prod - 4 nodes) > > > > >>> > Zookeepers-5 > > > > >>> > > > > > >>> > I have tried the SPLITSHARD command in one of the non prod > > > > >>> environments. > > > > >>> > > > > > >>> > * > > > > >>> > > > > > >>> > > > > > > > > > > https://solrserver.corp.company.com:8981/solr/admin/collections?action=SPLITSHARD&collection=abcStore&shard=shard1 > > > > >>> > < > > > > >>> > > > > > >>> > > > > > > > > > > https://solrserver.corp.company.com:8981/solr/admin/collections?action=SPLITSHARD&collection=abcStore&shard=shard1 > > > > >>> > >* > > > > >>> > Now i can see total 3 shards > > > > >>> > Shard1 > > > > >>> > Shard1_0 > > > > >>> > Shard1_1 > > > > >>> > > > > > >>> > But Shard1 is shown as inactive. Please let me know if we need > to > > > > >>> remove > > > > >>> > this ? > > > > >>> > > > > > >>> > Please help me if th
Re: Add a new Shard to the collection
Thank you uyilmaz for the detailed explanation. On Thu, Aug 3, 2023 at 6:21 PM ufuk yılmaz wrote: > My two cents, it took me some time to understand when to add shards or > replicas when I first started using Solr, > > Speed of a single isolated query when system is idle -VS- total throughput > of the system when many queries are executing > > Sharding divides data into smaller pieces and puts each piece(shard) on a > separate computer. Now for a single incoming query, all of those nodes > needs to be searched because it has no idea which piece(shard) may contain > the searched documents. If your data was very big before sharding, this > improves response time of a single isolated query. If your data wasn’t big > to start with, this *may* even hurt performance because more network trips. > (Big = index size is larger than available RAM amount on the machine) > > Adding replicas creates a copy of your data and puts that copy on a > separate computer. Now for a single incoming query, still, a single node is > selected and queried. Single isolated query doesn’t get faster, but overall > throughput is increased (system can serve 2x more queries per second > before coughing). > > There are many “but if”s and different cases with this line of thought, > but I hope it explains the main idea to someone new to Solr? > > A confusing thing for someone new is the relation between collection, > replica and shard. A collection has one-to-many relationship with shard, > and a shard has one-to-many relation with replica > > Hope this is useful > > Sent from Mail for Windows > > From: Ilan Ginzburg > Sent: Thursday, August 3, 2023 2:55 PM > To: users@solr.apache.org > Subject: Re: Add a new Shard to the collection > > I don't think adding shards (even from 1 to 2) is the solution. > You need enough replicas so all your nodes share the load, but with such > small shards you likely don't need more than 1. > If your nodes are saturated by traffic, you need more nodes (and more > replicas so that the added nodes have a replica as well). > > Ilan > > On Thu, Aug 3, 2023, 8:23 AM HariBabu kuruva > wrote: > > > Hi Ilan, > > > > Thank you for your reply. > > > > Application requests are facing connection failures a couple of times. So > > our DEV team requested to add more shards as they are expecting more read > > heavy queries in the future. > > > > Initially they requested two shards and now they are asking for one more > > shard.(3 shards). We have a total of 6 solr nodes available. > > > > The disk sizes consumed by the currently created two shards are around > 2.5 > > GB each. > > > > Please let me know if any other information is required. > > > > > > > > > > > > On Wed, Aug 2, 2023 at 11:29 PM Ilan Ginzburg > wrote: > > > > > Well, if the size of the two shards you now have is equivalent, you > will > > > not be able to get to 3 balanced (in size) shards. > > > > > > If one of the two seems to get more data (is larger), split that one. > > This > > > might be the case if you use fancy routing for deciding which doc goes > > > where. > > > > > > Otherwise, to get to 3 similarly sized shards you need to explicitly > > > specify the ranges during the split. > > > Either create one subshard with twice the range of the other so you can > > > split the larger one into two and end up with 3 similarly sized shards, > > or > > > split the initial shard into 3 subshards in one go (I've never tried > > > splitting into more that 2 shards though, so I end up with a power of 2 > > > number of balanced shards, assuming uniform distribution of docs into > the > > > hash range). > > > > > > But I assume your real goal is not having a specific number of shards. > > > What issues are you running into in your current setup that you're > trying > > > to address? > > > You mentioned "better performance" but performance of what? Query? > > > Indexing? Are you running out of memory? CPU? Are you adding nodes > > > (servers) and/or replicas as you're increasing the number of shards? > > > > > > What has improved as you moved from one to two shards? Why decide then > > that > > > you then want to have 3 shards and no stay at 2 or move to 4? > > > > > > Ilan > > > > > > On Wed, Aug 2, 2023, 5:48 PM HariBabu kuruva < > hari2708.kur...@gmail.com> > > > wrote: > > > > > > > Hi All, > > > > > > > > I did sharding, splitted shard1 into shard-1_0 and shard-1_1 > > > > I want to have one more shard(3 shards). In this case, which shard > > > should I > > > > split . Please advise. > > > > > > > > > > > > On Tue, Aug 1, 2023 at 11:17 AM HariBabu kuruva < > > > hari2708.kur...@gmail.com > > > > > > > > > wrote: > > > > > > > > > ++ FYI, I can see the old shard automatically removed. > > > > > > > > > > On Mon, Jul 31, 2023 at 11:39 AM HariBabu kuruva < > > > > > hari2708.kur...@gmail.com> wrote: > > > > > > > > > >> Thanks for your reply. > > > > >> > > > > >> I am a little bit worried about PROD. Can I go ahead and do the > same > > > > >> steps in
Changing Solr collection's DirectoryFactory
Hi everyone, Using: Solr 8.11.2 with rhel9 Currently using "solr.NRTCachingDirectoryFactory" for a collection, the collection has grown big in size, but don't want to add more RAM to machine(AWS), I can increase IOPS and througput for data volume. Was thinking of using "solr.NIOFSDirectoryFactory", but wanted to know, how will it impact to existing collection? May be it is just a way to read index files, but to be sure, will it affect my existing indexed data? Any light on this will be helpful. Thanks. Jayesh Shende
Re: knn parser not working as expected
Also, here is the debug output for that workaround with fq I mentioned. This debug output is not big. "debug":{ "rawquerystring":"*:*", "querystring":"*:*", "parsedquery":"+(+MatchAllDocsQuery(*:*)) ()", "parsedquery_toString":"+(+*:*) ()", "json":{"params":{ "q":"*:*", "fq":"{!knn f=dense_vector topK=1}[0.06525743007659912,0.015727980062365532,0.003069591475650668,-0.016254400834441185,0.003478930564597249,-0.02475954219698906,0.020238326862454414,0.010255611501634121,0.05522076040506363,0.020635411143302917,0.05825875699520111,-0.05110647529363632,-0.04696913808584213,0.05991407483816147,-0.0003015052934642881,0.03625837340950966,-0.044656239449977875,-0.06582673639059067,-0.06842341274023056,-0.022927379235625267,0.048230838030576706,-0.12659960985183716,-0.019311215728521347,-0.04432906210422516,0.03600681200623512,0.010301047936081886,0.08415472507476807,0.04727723449468613,-0.0584205724298954,-0.045265913009643555,0.012285877950489521,0.0034233061596751213,-0.00982636958360672,-0.013216182589530945,-0.038882751017808914,-0.05872005969285965,-0.029350444674491882,0.04930287227034569,0.0022274062503129244,0.01728842593729496,-0.08762819767,-0.045831114053726196,0.072530098259449,0.03804686293005943,0.0021682181395590305,-0.05424166098237038,-0.004494055639952421,0.05843663960695267,0.058729417622089386,0.016252348199486732,0.0019551776349544525,-0.012190568260848522,-0.08235936611890793,-0.003848800901323557,0.028969185426831245,0.047798849642276764,-0.04074695333838463,-0.10175333172082901,0.06699151545763016,-0.06788542866706848,-0.01607389748096466,0.07294511049985886,0.007754810154438019,0.039606861770153046,0.07451225817203522,-0.02967959391212,0.014015864580869675,0.08055979013442993,0.0010412412229925394,0.13284511864185333,-0.013288799673318863,-0.05446619912981987,-0.03510258346796036,-0.12459734082221985,-0.017629574984312057,-0.04287091642618179,-0.019087448716163635,0.027409998700022697,-0.040427371859550476,-0.1713477075099945,-0.0035959691740572453,0.01750982739031315,-0.06452985852956772,0.10622204840183258,-0.06865541636943817,0.06022517383098602,0.03378240391612053,0.02320132404565811,0.02072194404900074,0.03390982002019882,0.0051648980006575584,0.05843415856361389,-0.07012602686882019,0.046549294143915176,0.005304296966642141,0.09183698892593384,0.060101959854364395,-0.031673040241003036,0.03126641735434532,0.10213921219110489,0.07624002546072006,-0.09995660930871964,0.03316718339920044,-0.040208760648965836,-0.016963355243206024,-0.01603076048195362,-0.00566966412588954,0.0570228286087513,0.006566803902387619,0.028397461399435997,-0.03737075999379158,-0.03357473015785217,-0.05060608312487602,0.0882791057229042,0.14182551205158234,0.01651209406554699,0.047577112913131714,-0.028357332572340965,-0.12397051602602005,0.03264006972312927,0.030581200495362282,0.025287700816988945,-0.08509892970323563,0.032361947267,-0.06732083112001419,0.0193667970597744,0.07096285372972488,-5.732041797079612e-33,0.033934514969587326,0.029480531811714172,-0.024119360372424126,0.03248802572488785,0.060654137283563614,-0.04089922457933426,-0.06845896691083908,0.015865417197346687,-0.03816983848810196,0.12768638134002686,-0.047979939728975296,0.01888129487633705,0.01966758444905281,-0.021792754530906677,-0.00209379056468606,-0.060791824012994766,0.07595516741275787,-0.05137578397989273,-0.020345840603113174,0.02730456180870533,-0.08421282470226288,0.0052170781418681145,-0.0396740548312664,0.013655638322234154,0.043763574212789536,0.0368662029504776,-0.021710995584726334,0.03603581339120865,0.04991370812058449,-0.007524373475462198,0.033250145614147186,0.0669487863779068,-0.012807670049369335,-0.08904062211513519,-0.04803512617945671,-0.0461772084236145,0.018098553642630577,0.01096352282911539,0.0617918036878109,0.014066621661186218,-0.03305654972791672,-0.08129353821277618,-0.025270603597164154,0.03537251427769661,0.06029881164431572,0.06169535592198372,0.0355769582092762,0.03534447401762009,-0.047377053648233414,0.053076375275850296,-0.019250469282269478,-0.03837420791387558,-0.00834209006279707,0.031550273299217224,0.004682184662669897,0.0590718574821949,0.0326957181096077,-0.041941817849874496,-0.04179370403289795,-0.010403091087937355,0.11914990842342377,-0.049126915633678436,0.015761952847242355,-0.012162514962255955,-0.05942496284842491,0.04794146493077278,-0.06834675371646881,-0.03294386342167854,0.02242257259786129,0.0774146020412445,-0.1095564718246,0.023828692734241486,0.054935190826654434,0.0202674251049757,-0.057155776768922806,-0.009578827768564224,-0.051850661635398865,0.09117215871810913,-0.07315851002931595,-0.0019339871359989047,-0.05835318937897682,-0.058747921139001846,-0.05519327148795128,-0.014699703082442284,-0.0020833320450037718,-0.05721793323755264,0.055632084608078,0.006448595318943262,0.0034963993821293116,-0.031087594106793404,-0.09541762620210648,0.03679275885224342,-0.012651922181248665,-0.038976479321718216,-0.013171667233109
Re: Fwd:
On 8/3/23 22:45, Ayana Joby wrote: Hello Team, We are using following configuration for Japanese language, but synonym search is not working using this configuration for japanese language Only one of your attachments made it through to the list. But I have seen them in Jira. In Jira, I mentioned you were using the analysis tab incorrectly. You have entered the same value for both index and query. You will need to put the query string in the right-hand box and put the actual indexed data in the left-hand box. Don't copy it from the synonyms file, copy it from the actual document. Copy the entire field contents, not just the string you're hoping to match. For reference, the Jira issue: https://issues.apache.org/jira/browse/SOLR-16914 Thanks, Shawn
Re: Changing Solr collection's DirectoryFactory
On 8/4/23 09:56, Jayesh Shende wrote: Using: Solr 8.11.2 with rhel9 Currently using "solr.NRTCachingDirectoryFactory" for a collection, the collection has grown big in size, but don't want to add more RAM to machine(AWS), I can increase IOPS and througput for data volume. Was thinking of using "solr.NIOFSDirectoryFactory", but wanted to know, how will it impact to existing collection? May be it is just a way to read index files, but to be sure, will it affect my existing indexed data? It's generally not a good idea to explicitly configure the directory factory. That should only be done in very unusual circumstances. Your situation probably does not qualify. Remove any config for that and let Solr/Lucene pick the class that's best for the environment. It will probably choose NRTCachingDirectoryFactory. If a better option becomes available in a newer Solr version, it will most likely be automatically chosen as long as the value isn't explicitly configured. Looking at the source, I cannot tell for sure whether NOIFS uses mmap, but I suspect it does not. For nearly all use cases, you want a directory implementation that uses mmap, which the NRTCaching implementation does. Changing the directory factory is very unlikely to cause any problems with the existing index. But I am curious why you want to change that... what have you encountered and why do you think you should go with a non-default class? If you have enough memory installed, the disk speed will have very little impact on performance. Disk performance only becomes important in situations where you do not have enough spare memory for effective disk caching. Memory is faster than disk, even if the disk is extremely fast SSD. A directory implementation that uses mmap will be the fastest option. Thanks, Shawn
Re: Changing Solr collection's DirectoryFactory
Hi Shawn, Thanks for responding so quickly. The server box is shared by multiple Solr nodes, each node is having more than 100gb of disk usage (~2-4 replicas of different collections on one Solr). The NRTCachingDirectoryFactory is trying to cache as much segments as possible into the memory, but the queries are for different collections and are varying (less of repetitive query terms), so thinking this cached segments are not actually very useful here, and RAM (apart from JVM assigned) is not enough to cache even 10% of the index, for each Solr node running. Also it is an existing Solr, trying to improve performance, and as we know NIO is better than IO in java and I can increase IOPS and throughput for disk, so was gathering how will it affect? Before changing anything will try removing the explicit configuration for directoryFactory to see how it works/how it picks the best for underlying OS. *As this should not affect the underlying indexed data. for the collectios. Thanks. Jayesh Shende On Fri, 4 Aug 2023, 22:35 Shawn Heisey, wrote: > On 8/4/23 09:56, Jayesh Shende wrote: > > Using: Solr 8.11.2 with rhel9 > > > > Currently using "solr.NRTCachingDirectoryFactory" for a collection, > > the collection has grown big in size, but don't want to add more RAM to > > machine(AWS), > > I can increase IOPS and througput for data volume. > > > > Was thinking of using "solr.NIOFSDirectoryFactory", > > but wanted to know, how will it impact to existing collection? > > May be it is just a way to read index files, but to be sure, will it > > affect my existing indexed data? > > It's generally not a good idea to explicitly configure the directory > factory. That should only be done in very unusual circumstances. Your > situation probably does not qualify. > > Remove any config for that and let Solr/Lucene pick the class that's > best for the environment. It will probably choose > NRTCachingDirectoryFactory. If a better option becomes available in a > newer Solr version, it will most likely be automatically chosen as long > as the value isn't explicitly configured. > > Looking at the source, I cannot tell for sure whether NOIFS uses mmap, > but I suspect it does not. For nearly all use cases, you want a > directory implementation that uses mmap, which the NRTCaching > implementation does. > > Changing the directory factory is very unlikely to cause any problems > with the existing index. But I am curious why you want to change > that... what have you encountered and why do you think you should go > with a non-default class? > > If you have enough memory installed, the disk speed will have very > little impact on performance. Disk performance only becomes important > in situations where you do not have enough spare memory for effective > disk caching. Memory is faster than disk, even if the disk is extremely > fast SSD. > > A directory implementation that uses mmap will be the fastest option. > > Thanks, > Shawn >
RE: Changing Solr collection's DirectoryFactory
I was in a similar situation, our index was way too big compared to the RAM on the nodes. I was seeing constant %100 disk read, query timeouts and dead nodes because the default directory reader (nrtcaching) was trying to cache a different part of the index in memory for every other request but queries were rarely against the same collection. Our disks could do 1gb per second read but a single simple query would cause 40 seconds of constant reading to return just a few documents. Time to time solr went completely unresponsive until it could finish doing disk reads for previous requests. I switched to NIOFS reader and disk problem was solved. Just don’t expect Solr to be super fast as it was with a small index which could fit in RAM. -ufuk Sent from Mail for Windows From: Jayesh Shende Sent: Friday, August 4, 2023 8:44 PM To: users@solr.apache.org Subject: Re: Changing Solr collection's DirectoryFactory Hi Shawn, Thanks for responding so quickly. The server box is shared by multiple Solr nodes, each node is having more than 100gb of disk usage (~2-4 replicas of different collections on one Solr). The NRTCachingDirectoryFactory is trying to cache as much segments as possible into the memory, but the queries are for different collections and are varying (less of repetitive query terms), so thinking this cached segments are not actually very useful here, and RAM (apart from JVM assigned) is not enough to cache even 10% of the index, for each Solr node running. Also it is an existing Solr, trying to improve performance, and as we know NIO is better than IO in java and I can increase IOPS and throughput for disk, so was gathering how will it affect? Before changing anything will try removing the explicit configuration for directoryFactory to see how it works/how it picks the best for underlying OS. *As this should not affect the underlying indexed data. for the collectios. Thanks. Jayesh Shende On Fri, 4 Aug 2023, 22:35 Shawn Heisey, wrote: > On 8/4/23 09:56, Jayesh Shende wrote: > > Using: Solr 8.11.2 with rhel9 > > > > Currently using "solr.NRTCachingDirectoryFactory" for a collection, > > the collection has grown big in size, but don't want to add more RAM to > > machine(AWS), > > I can increase IOPS and througput for data volume. > > > > Was thinking of using "solr.NIOFSDirectoryFactory", > > but wanted to know, how will it impact to existing collection? > > May be it is just a way to read index files, but to be sure, will it > > affect my existing indexed data? > > It's generally not a good idea to explicitly configure the directory > factory. That should only be done in very unusual circumstances. Your > situation probably does not qualify. > > Remove any config for that and let Solr/Lucene pick the class that's > best for the environment. It will probably choose > NRTCachingDirectoryFactory. If a better option becomes available in a > newer Solr version, it will most likely be automatically chosen as long > as the value isn't explicitly configured. > > Looking at the source, I cannot tell for sure whether NOIFS uses mmap, > but I suspect it does not. For nearly all use cases, you want a > directory implementation that uses mmap, which the NRTCaching > implementation does. > > Changing the directory factory is very unlikely to cause any problems > with the existing index. But I am curious why you want to change > that... what have you encountered and why do you think you should go > with a non-default class? > > If you have enough memory installed, the disk speed will have very > little impact on performance. Disk performance only becomes important > in situations where you do not have enough spare memory for effective > disk caching. Memory is faster than disk, even if the disk is extremely > fast SSD. > > A directory implementation that uses mmap will be the fastest option. > > Thanks, > Shawn >
Re: Changing Solr collection's DirectoryFactory
On 8/4/23 11:43, Jayesh Shende wrote: The NRTCachingDirectoryFactory is trying to cache as much segments as possible into the memory, but the queries are for different collections and are varying (less of repetitive query terms), so thinking this cached segments are not actually very useful here, and RAM (apart from JVM assigned) is not enough to cache even 10% of the index, for each Solr node running. Solr does NOT proactively cache data from the index files. It leaves that to the operating system. If a certain piece of data is never accessed, it will never end up in the cache. Same is true of the on-heap caches that Solr generates and maintains ... data that is never accessed will NOT be cached. Also it is an existing Solr, trying to improve performance, and as we know NIO is better than IO in java and I can increase IOPS and throughput for disk, so was gathering how will it affect? When resources are properly sized, MMAP is the most efficient option for accessing file data available on ANY operating system. The best way to improve Solr performance is to add memory. But if you're in a cloud-based setup, that is quite expensive. From the other response, I gather that for memory-starved setups, switching the directory implementation can improve performance. But I offer this prediction: If you continue to run without sufficient memory, you're eventually going to hit a performance wall that can only be fixed by adding memory. Thanks, Shawn