Re: Help needed testing new systemd script (SOLR-14410)
On 5/23/2023 3:12 AM, Jan Høydahl wrote: We have an excellent contribution in https://issues.apache.org/jira/browse/SOLR-14410 and https://github.com/apache/solr/pull/428 to switch to systemd init script for Solr. Before we can merge we need help testing it on more Unix flavours. Do you have access to RHEL, CentOS or OpenSuse and want to help testing then give us a ping in the JIRA issue or PR. I think I wrote this on the issue: I have an infrastructure where I can easily create VMs to try this out on many different operating systems. Point me at an installer for an OS you'd like to try it on and I can give it a whirl. If I can figure out the installer! A systemd service would make installation a LOT more bulletproof. Seems like at least once a month I learn about a new thing that systemd has built in that used to require a dedicated service. Thanks, Shawn
Deleting document on wrong shard?
We have a messed-up index with documents on shards where they shouldn’t be. Content was indexed, shards were added, then everything was reindexed. So the new document with the same ID was put on a new shard, leaving the previous version on the old shard (where it doesn’t match the hash range). I’m trying to delete the old document by sending an update with delete-by-id and a shards parameter. It returns success, but the document isn’t deleted. Is the hash range being checked and overriding the shards param somehow? Any ideas on how to make this work? And yes, we won’t do that again. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)
Re: Deleting document on wrong shard?
I thought deletes were "broadcast" but probably for the composite-id router it is not since we know for sure where it resides. You say "shards were added" - how did you do that? Sounds like you shold simply re-create your collection and re-index? Jan > 24. mai 2023 kl. 16:39 skrev Walter Underwood : > > We have a messed-up index with documents on shards where they shouldn’t be. > Content was indexed, shards were added, then everything was reindexed. So the > new document with the same ID was put on a new shard, leaving the previous > version on the old shard (where it doesn’t match the hash range). > > I’m trying to delete the old document by sending an update with delete-by-id > and a shards parameter. It returns success, but the document isn’t deleted. > > Is the hash range being checked and overriding the shards param somehow? Any > ideas on how to make this work? > > And yes, we won’t do that again. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) >
Re: Deleting document on wrong shard?
Clearly, they are not broadcast, or if they are, they are filtered by the hash range before executing. If they were broadcast, this problem would not have happened. Yes, we’ll delete-all and reindex at some point. This collection has 1.7 billion documents across 96 shards, so a full reindex is not an everyday occurrence. I’m trying to clean up the minor problem of 675k documents with dupes. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 24, 2023, at 8:06 AM, Jan Høydahl wrote: > > I thought deletes were "broadcast" but probably for the composite-id router > it is not since we know for sure where it resides. > You say "shards were added" - how did you do that? > Sounds like you shold simply re-create your collection and re-index? > > Jan > >> 24. mai 2023 kl. 16:39 skrev Walter Underwood : >> >> We have a messed-up index with documents on shards where they shouldn’t be. >> Content was indexed, shards were added, then everything was reindexed. So >> the new document with the same ID was put on a new shard, leaving the >> previous version on the old shard (where it doesn’t match the hash range). >> >> I’m trying to delete the old document by sending an update with delete-by-id >> and a shards parameter. It returns success, but the document isn’t deleted. >> >> Is the hash range being checked and overriding the shards param somehow? Any >> ideas on how to make this work? >> >> And yes, we won’t do that again. >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >
Re: Deleting document on wrong shard?
Often it's a better idea to index into a fresh collection when making changes that imply a full re-index. If you use an alias, the swap out of the old collection is atomic when you update the alias, requiring no front end changes at all (and swap back is easy if things aren't what you expected). Of course if you're not running cloud the same applies to creating a new index, but aliases won't be available, so then hopefully you've built some proxy layer or application API that has a simple configuration to update where it queries... If you're going to spend the effort to reprocess all the docs, why not end up in a state with no deleted docs? (i.e. don't do delete-all, create a new collection) On Wed, May 24, 2023 at 11:08 AM Jan Høydahl wrote: > I thought deletes were "broadcast" but probably for the composite-id > router it is not since we know for sure where it resides. > You say "shards were added" - how did you do that? > Sounds like you shold simply re-create your collection and re-index? > > Jan > > > 24. mai 2023 kl. 16:39 skrev Walter Underwood : > > > > We have a messed-up index with documents on shards where they shouldn’t > be. Content was indexed, shards were added, then everything was reindexed. > So the new document with the same ID was put on a new shard, leaving the > previous version on the old shard (where it doesn’t match the hash range). > > > > I’m trying to delete the old document by sending an update with > delete-by-id and a shards parameter. It returns success, but the document > isn’t deleted. > > > > Is the hash range being checked and overriding the shards param somehow? > Any ideas on how to make this work? > > > > And yes, we won’t do that again. > > > > wunder > > Walter Underwood > > wun...@wunderwood.org > > http://observer.wunderwood.org/ (my blog) > > > > -- http://www.needhamsoftware.com (work) http://www.the111shift.com (play)
Re: Deleting document on wrong shard?
Responses about how to avoid this are not on topic. I’ve had Solr in production since version 1.3 and I know the right way. I think I know how we got into this mess. The cluster is configured and deployed into Kubernetes. I think it was rebuilt with more shards then the existing storage volumes were mounted for the matching shards. New shards got empty volumes. Then the content was reloaded without a delete-all. Would it work to send the deletes directly to the leader for the shard? That might bypass the hash-based routing. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 24, 2023, at 8:35 AM, Walter Underwood wrote: > > Clearly, they are not broadcast, or if they are, they are filtered by the > hash range before executing. If they were broadcast, this problem would not > have happened. > > Yes, we’ll delete-all and reindex at some point. This collection has 1.7 > billion documents across 96 shards, so a full reindex is not an everyday > occurrence. I’m trying to clean up the minor problem of 675k documents with > dupes. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > >> On May 24, 2023, at 8:06 AM, Jan Høydahl wrote: >> >> I thought deletes were "broadcast" but probably for the composite-id router >> it is not since we know for sure where it resides. >> You say "shards were added" - how did you do that? >> Sounds like you shold simply re-create your collection and re-index? >> >> Jan >> >>> 24. mai 2023 kl. 16:39 skrev Walter Underwood : >>> >>> We have a messed-up index with documents on shards where they shouldn’t be. >>> Content was indexed, shards were added, then everything was reindexed. So >>> the new document with the same ID was put on a new shard, leaving the >>> previous version on the old shard (where it doesn’t match the hash range). >>> >>> I’m trying to delete the old document by sending an update with >>> delete-by-id and a shards parameter. It returns success, but the document >>> isn’t deleted. >>> >>> Is the hash range being checked and overriding the shards param somehow? >>> Any ideas on how to make this work? >>> >>> And yes, we won’t do that again. >>> >>> wunder >>> Walter Underwood >>> wun...@wunderwood.org >>> http://observer.wunderwood.org/ (my blog) >>> >> >
Re: Deleting document on wrong shard?
Understood, of course I've seen your name on the list for a long time. Partly my response is for the benefit of readers too, sorry if that bothered you. You of course may have good reasons, and carefully refined a design for your situation, that might not be best emulated everywhere. Living in Kube is tricky partly because (as I understand it) it was designed with stateless web stuff and microservices in mind I think and it's really easy for folks administering to trip on googled advice that has that mindset. Sounds like possibly someone in ops was thinking in terms of pods being interchangeable, lightweight objects and not thinking about the persistent volumes needing to line up and match the design the same way every time. On topic: not sure, but one might need to set distrb=false or something like that to avoid the routing. On Wed, May 24, 2023 at 12:49 PM Walter Underwood wrote: > Responses about how to avoid this are not on topic. I’ve had Solr in > production since version 1.3 and I know the right way. > > I think I know how we got into this mess. The cluster is configured and > deployed into Kubernetes. I think it was rebuilt with more shards then the > existing storage volumes were mounted for the matching shards. New shards > got empty volumes. Then the content was reloaded without a delete-all. > > Would it work to send the deletes directly to the leader for the shard? > That might bypass the hash-based routing. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On May 24, 2023, at 8:35 AM, Walter Underwood > wrote: > > > > Clearly, they are not broadcast, or if they are, they are filtered by > the hash range before executing. If they were broadcast, this problem would > not have happened. > > > > Yes, we’ll delete-all and reindex at some point. This collection has 1.7 > billion documents across 96 shards, so a full reindex is not an everyday > occurrence. I’m trying to clean up the minor problem of 675k documents with > dupes. > > > > wunder > > Walter Underwood > > wun...@wunderwood.org > > http://observer.wunderwood.org/ (my blog) > > > >> On May 24, 2023, at 8:06 AM, Jan Høydahl wrote: > >> > >> I thought deletes were "broadcast" but probably for the composite-id > router it is not since we know for sure where it resides. > >> You say "shards were added" - how did you do that? > >> Sounds like you shold simply re-create your collection and re-index? > >> > >> Jan > >> > >>> 24. mai 2023 kl. 16:39 skrev Walter Underwood : > >>> > >>> We have a messed-up index with documents on shards where they > shouldn’t be. Content was indexed, shards were added, then everything was > reindexed. So the new document with the same ID was put on a new shard, > leaving the previous version on the old shard (where it doesn’t match the > hash range). > >>> > >>> I’m trying to delete the old document by sending an update with > delete-by-id and a shards parameter. It returns success, but the document > isn’t deleted. > >>> > >>> Is the hash range being checked and overriding the shards param > somehow? Any ideas on how to make this work? > >>> > >>> And yes, we won’t do that again. > >>> > >>> wunder > >>> Walter Underwood > >>> wun...@wunderwood.org > >>> http://observer.wunderwood.org/ (my blog) > >>> > >> > > > > -- http://www.needhamsoftware.com (work) http://www.the111shift.com (play)
Re: Deleting document on wrong shard?
Ooh, going directly to the leader node and using distrib=false, I like that idea. Now I need to figure out how to directly hit the danged Kubernetes pods. The config/deploy design here is pretty solid and aware of persistent storage volumes. It works fine for increasing replicas. We just need to avoid changing the number of shards without a reindex. One of the other clusters has 320 shards. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 24, 2023, at 10:12 AM, Gus Heck wrote: > > Understood, of course I've seen your name on the list for a long time. > Partly my response is for the benefit of readers too, sorry if that > bothered you. You of course may have good reasons, and carefully refined a > design for your situation, that might not be best emulated everywhere. > Living in Kube is tricky partly because (as I understand it) it was > designed with stateless web stuff and microservices in mind I think and > it's really easy for folks administering to trip on googled advice that has > that mindset. Sounds like possibly someone in ops was thinking in terms of > pods being interchangeable, lightweight objects and not thinking about the > persistent volumes needing to line up and match the design the same way > every time. > > On topic: not sure, but one might need to set distrb=false or something > like that to avoid the routing. > > On Wed, May 24, 2023 at 12:49 PM Walter Underwood > wrote: > >> Responses about how to avoid this are not on topic. I’ve had Solr in >> production since version 1.3 and I know the right way. >> >> I think I know how we got into this mess. The cluster is configured and >> deployed into Kubernetes. I think it was rebuilt with more shards then the >> existing storage volumes were mounted for the matching shards. New shards >> got empty volumes. Then the content was reloaded without a delete-all. >> >> Would it work to send the deletes directly to the leader for the shard? >> That might bypass the hash-based routing. >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >>> On May 24, 2023, at 8:35 AM, Walter Underwood >> wrote: >>> >>> Clearly, they are not broadcast, or if they are, they are filtered by >> the hash range before executing. If they were broadcast, this problem would >> not have happened. >>> >>> Yes, we’ll delete-all and reindex at some point. This collection has 1.7 >> billion documents across 96 shards, so a full reindex is not an everyday >> occurrence. I’m trying to clean up the minor problem of 675k documents with >> dupes. >>> >>> wunder >>> Walter Underwood >>> wun...@wunderwood.org >>> http://observer.wunderwood.org/ (my blog) >>> On May 24, 2023, at 8:06 AM, Jan Høydahl wrote: I thought deletes were "broadcast" but probably for the composite-id >> router it is not since we know for sure where it resides. You say "shards were added" - how did you do that? Sounds like you shold simply re-create your collection and re-index? Jan > 24. mai 2023 kl. 16:39 skrev Walter Underwood : > > We have a messed-up index with documents on shards where they >> shouldn’t be. Content was indexed, shards were added, then everything was >> reindexed. So the new document with the same ID was put on a new shard, >> leaving the previous version on the old shard (where it doesn’t match the >> hash range). > > I’m trying to delete the old document by sending an update with >> delete-by-id and a shards parameter. It returns success, but the document >> isn’t deleted. > > Is the hash range being checked and overriding the shards param >> somehow? Any ideas on how to make this work? > > And yes, we won’t do that again. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > >>> >> >> > > -- > http://www.needhamsoftware.com (work) > http://www.the111shift.com (play)
Re: Deleting document on wrong shard?
Would specifying a _route_ parameter in the request work? https://issues.apache.org/jira/browse/SOLR-6910 I know your case is not implicit router based, but just wondering if it still works somehow? On Wed, 24 May 2023 at 23:28, Walter Underwood wrote: > Ooh, going directly to the leader node and using distrib=false, I like > that idea. Now I need to figure out how to directly hit the danged > Kubernetes pods. > > The config/deploy design here is pretty solid and aware of persistent > storage volumes. It works fine for increasing replicas. We just need to > avoid changing the number of shards without a reindex. One of the other > clusters has 320 shards. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On May 24, 2023, at 10:12 AM, Gus Heck wrote: > > > > Understood, of course I've seen your name on the list for a long time. > > Partly my response is for the benefit of readers too, sorry if that > > bothered you. You of course may have good reasons, and carefully refined > a > > design for your situation, that might not be best emulated everywhere. > > Living in Kube is tricky partly because (as I understand it) it was > > designed with stateless web stuff and microservices in mind I think and > > it's really easy for folks administering to trip on googled advice that > has > > that mindset. Sounds like possibly someone in ops was thinking in terms > of > > pods being interchangeable, lightweight objects and not thinking about > the > > persistent volumes needing to line up and match the design the same way > > every time. > > > > On topic: not sure, but one might need to set distrb=false or something > > like that to avoid the routing. > > > > On Wed, May 24, 2023 at 12:49 PM Walter Underwood > > > wrote: > > > >> Responses about how to avoid this are not on topic. I’ve had Solr in > >> production since version 1.3 and I know the right way. > >> > >> I think I know how we got into this mess. The cluster is configured and > >> deployed into Kubernetes. I think it was rebuilt with more shards then > the > >> existing storage volumes were mounted for the matching shards. New > shards > >> got empty volumes. Then the content was reloaded without a delete-all. > >> > >> Would it work to send the deletes directly to the leader for the shard? > >> That might bypass the hash-based routing. > >> > >> wunder > >> Walter Underwood > >> wun...@wunderwood.org > >> http://observer.wunderwood.org/ (my blog) > >> > >>> On May 24, 2023, at 8:35 AM, Walter Underwood > >> wrote: > >>> > >>> Clearly, they are not broadcast, or if they are, they are filtered by > >> the hash range before executing. If they were broadcast, this problem > would > >> not have happened. > >>> > >>> Yes, we’ll delete-all and reindex at some point. This collection has > 1.7 > >> billion documents across 96 shards, so a full reindex is not an everyday > >> occurrence. I’m trying to clean up the minor problem of 675k documents > with > >> dupes. > >>> > >>> wunder > >>> Walter Underwood > >>> wun...@wunderwood.org > >>> http://observer.wunderwood.org/ (my blog) > >>> > On May 24, 2023, at 8:06 AM, Jan Høydahl > wrote: > > I thought deletes were "broadcast" but probably for the composite-id > >> router it is not since we know for sure where it resides. > You say "shards were added" - how did you do that? > Sounds like you shold simply re-create your collection and re-index? > > Jan > > > 24. mai 2023 kl. 16:39 skrev Walter Underwood >: > > > > We have a messed-up index with documents on shards where they > >> shouldn’t be. Content was indexed, shards were added, then everything > was > >> reindexed. So the new document with the same ID was put on a new shard, > >> leaving the previous version on the old shard (where it doesn’t match > the > >> hash range). > > > > I’m trying to delete the old document by sending an update with > >> delete-by-id and a shards parameter. It returns success, but the > document > >> isn’t deleted. > > > > Is the hash range being checked and overriding the shards param > >> somehow? Any ideas on how to make this work? > > > > And yes, we won’t do that again. > > > > wunder > > Walter Underwood > > wun...@wunderwood.org > > http://observer.wunderwood.org/ (my blog) > > > > >>> > >> > >> > > > > -- > > http://www.needhamsoftware.com (work) > > http://www.the111shift.com (play) > >
Re: Deleting document on wrong shard?
Nice catch. This issue looks exactly like what I’m seeing, it returns success but does not delete the document. SOLR-5890 Delete silently fails if not sent to shard where document was added wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 24, 2023, at 12:21 PM, Ishan Chattopadhyaya > wrote: > > Would specifying a _route_ parameter in the request work? > https://issues.apache.org/jira/browse/SOLR-6910 > I know your case is not implicit router based, but just wondering if it > still works somehow? > > > On Wed, 24 May 2023 at 23:28, Walter Underwood > wrote: > >> Ooh, going directly to the leader node and using distrib=false, I like >> that idea. Now I need to figure out how to directly hit the danged >> Kubernetes pods. >> >> The config/deploy design here is pretty solid and aware of persistent >> storage volumes. It works fine for increasing replicas. We just need to >> avoid changing the number of shards without a reindex. One of the other >> clusters has 320 shards. >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >>> On May 24, 2023, at 10:12 AM, Gus Heck wrote: >>> >>> Understood, of course I've seen your name on the list for a long time. >>> Partly my response is for the benefit of readers too, sorry if that >>> bothered you. You of course may have good reasons, and carefully refined >> a >>> design for your situation, that might not be best emulated everywhere. >>> Living in Kube is tricky partly because (as I understand it) it was >>> designed with stateless web stuff and microservices in mind I think and >>> it's really easy for folks administering to trip on googled advice that >> has >>> that mindset. Sounds like possibly someone in ops was thinking in terms >> of >>> pods being interchangeable, lightweight objects and not thinking about >> the >>> persistent volumes needing to line up and match the design the same way >>> every time. >>> >>> On topic: not sure, but one might need to set distrb=false or something >>> like that to avoid the routing. >>> >>> On Wed, May 24, 2023 at 12:49 PM Walter Underwood >> >>> wrote: >>> Responses about how to avoid this are not on topic. I’ve had Solr in production since version 1.3 and I know the right way. I think I know how we got into this mess. The cluster is configured and deployed into Kubernetes. I think it was rebuilt with more shards then >> the existing storage volumes were mounted for the matching shards. New >> shards got empty volumes. Then the content was reloaded without a delete-all. Would it work to send the deletes directly to the leader for the shard? That might bypass the hash-based routing. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 24, 2023, at 8:35 AM, Walter Underwood wrote: > > Clearly, they are not broadcast, or if they are, they are filtered by the hash range before executing. If they were broadcast, this problem >> would not have happened. > > Yes, we’ll delete-all and reindex at some point. This collection has >> 1.7 billion documents across 96 shards, so a full reindex is not an everyday occurrence. I’m trying to clean up the minor problem of 675k documents >> with dupes. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > >> On May 24, 2023, at 8:06 AM, Jan Høydahl >> wrote: >> >> I thought deletes were "broadcast" but probably for the composite-id router it is not since we know for sure where it resides. >> You say "shards were added" - how did you do that? >> Sounds like you shold simply re-create your collection and re-index? >> >> Jan >> >>> 24. mai 2023 kl. 16:39 skrev Walter Underwood >> : >>> >>> We have a messed-up index with documents on shards where they shouldn’t be. Content was indexed, shards were added, then everything >> was reindexed. So the new document with the same ID was put on a new shard, leaving the previous version on the old shard (where it doesn’t match >> the hash range). >>> >>> I’m trying to delete the old document by sending an update with delete-by-id and a shards parameter. It returns success, but the >> document isn’t deleted. >>> >>> Is the hash range being checked and overriding the shards param somehow? Any ideas on how to make this work? >>> >>> And yes, we won’t do that again. >>> >>> wunder >>> Walter Underwood >>> wun...@wunderwood.org >>> http://observer.wunderwood.org/ (my blog) >>> >> > >>> >>> -- >>> http://www.needhamsoftware.com (work) >>> http://www.the111shift.com (play) >> >>
Re: Deleting document on wrong shard?
Ah, now I remember this comment: https://issues.apache.org/jira/browse/SOLR-5890?focusedCommentId=14294129&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14294129 "Updated the patch, now with the Hash based router also honouring the _ *route*_ param." On Thu, 25 May 2023 at 01:38, Walter Underwood wrote: > Nice catch. This issue looks exactly like what I’m seeing, it returns > success but does not delete the document. > > SOLR-5890 > Delete silently fails if not sent to shard where document was added > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On May 24, 2023, at 12:21 PM, Ishan Chattopadhyaya < > ichattopadhy...@gmail.com> wrote: > > > > Would specifying a _route_ parameter in the request work? > > https://issues.apache.org/jira/browse/SOLR-6910 > > I know your case is not implicit router based, but just wondering if it > > still works somehow? > > > > > > On Wed, 24 May 2023 at 23:28, Walter Underwood > > wrote: > > > >> Ooh, going directly to the leader node and using distrib=false, I like > >> that idea. Now I need to figure out how to directly hit the danged > >> Kubernetes pods. > >> > >> The config/deploy design here is pretty solid and aware of persistent > >> storage volumes. It works fine for increasing replicas. We just need to > >> avoid changing the number of shards without a reindex. One of the other > >> clusters has 320 shards. > >> > >> wunder > >> Walter Underwood > >> wun...@wunderwood.org > >> http://observer.wunderwood.org/ (my blog) > >> > >>> On May 24, 2023, at 10:12 AM, Gus Heck wrote: > >>> > >>> Understood, of course I've seen your name on the list for a long time. > >>> Partly my response is for the benefit of readers too, sorry if that > >>> bothered you. You of course may have good reasons, and carefully > refined > >> a > >>> design for your situation, that might not be best emulated everywhere. > >>> Living in Kube is tricky partly because (as I understand it) it was > >>> designed with stateless web stuff and microservices in mind I think and > >>> it's really easy for folks administering to trip on googled advice that > >> has > >>> that mindset. Sounds like possibly someone in ops was thinking in terms > >> of > >>> pods being interchangeable, lightweight objects and not thinking about > >> the > >>> persistent volumes needing to line up and match the design the same way > >>> every time. > >>> > >>> On topic: not sure, but one might need to set distrb=false or something > >>> like that to avoid the routing. > >>> > >>> On Wed, May 24, 2023 at 12:49 PM Walter Underwood < > wun...@wunderwood.org > >>> > >>> wrote: > >>> > Responses about how to avoid this are not on topic. I’ve had Solr in > production since version 1.3 and I know the right way. > > I think I know how we got into this mess. The cluster is configured > and > deployed into Kubernetes. I think it was rebuilt with more shards then > >> the > existing storage volumes were mounted for the matching shards. New > >> shards > got empty volumes. Then the content was reloaded without a delete-all. > > Would it work to send the deletes directly to the leader for the > shard? > That might bypass the hash-based routing. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On May 24, 2023, at 8:35 AM, Walter Underwood > > wrote: > > > > Clearly, they are not broadcast, or if they are, they are filtered by > the hash range before executing. If they were broadcast, this problem > >> would > not have happened. > > > > Yes, we’ll delete-all and reindex at some point. This collection has > >> 1.7 > billion documents across 96 shards, so a full reindex is not an > everyday > occurrence. I’m trying to clean up the minor problem of 675k documents > >> with > dupes. > > > > wunder > > Walter Underwood > > wun...@wunderwood.org > > http://observer.wunderwood.org/ (my blog) > > > >> On May 24, 2023, at 8:06 AM, Jan Høydahl > >> wrote: > >> > >> I thought deletes were "broadcast" but probably for the composite-id > router it is not since we know for sure where it resides. > >> You say "shards were added" - how did you do that? > >> Sounds like you shold simply re-create your collection and re-index? > >> > >> Jan > >> > >>> 24. mai 2023 kl. 16:39 skrev Walter Underwood < > wun...@wunderwood.org > >>> : > >>> > >>> We have a messed-up index with documents on shards where they > shouldn’t be. Content was indexed, shards were added, then everything > >> was > reindexed. So the new document with the same ID was put on a new > shard, > leaving the previous version on the old shard (where it doesn’t match > >> the > hash range). > >>> > >>> I’m trying to dele
Re: Deleting document on wrong shard?
It works! Thanks so much. I’m using XML update format because the JSON format for sending multiple IDs for deletion is not documented anywhere I could find. It was easier to just generate XML instead of continuing to search for documentation. This does the trick: datalake_FPD_163298_3RGR-V090-006H-40F0-0-00 wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 24, 2023, at 1:13 PM, Ishan Chattopadhyaya > wrote: > > Ah, now I remember this comment: > https://issues.apache.org/jira/browse/SOLR-5890?focusedCommentId=14294129&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14294129 > > "Updated the patch, now with the Hash based router also honouring the _ > *route*_ param." > > On Thu, 25 May 2023 at 01:38, Walter Underwood > wrote: > >> Nice catch. This issue looks exactly like what I’m seeing, it returns >> success but does not delete the document. >> >> SOLR-5890 >> Delete silently fails if not sent to shard where document was added >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >>> On May 24, 2023, at 12:21 PM, Ishan Chattopadhyaya < >> ichattopadhy...@gmail.com> wrote: >>> >>> Would specifying a _route_ parameter in the request work? >>> https://issues.apache.org/jira/browse/SOLR-6910 >>> I know your case is not implicit router based, but just wondering if it >>> still works somehow? >>> >>> >>> On Wed, 24 May 2023 at 23:28, Walter Underwood >>> wrote: >>> Ooh, going directly to the leader node and using distrib=false, I like that idea. Now I need to figure out how to directly hit the danged Kubernetes pods. The config/deploy design here is pretty solid and aware of persistent storage volumes. It works fine for increasing replicas. We just need to avoid changing the number of shards without a reindex. One of the other clusters has 320 shards. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 24, 2023, at 10:12 AM, Gus Heck wrote: > > Understood, of course I've seen your name on the list for a long time. > Partly my response is for the benefit of readers too, sorry if that > bothered you. You of course may have good reasons, and carefully >> refined a > design for your situation, that might not be best emulated everywhere. > Living in Kube is tricky partly because (as I understand it) it was > designed with stateless web stuff and microservices in mind I think and > it's really easy for folks administering to trip on googled advice that has > that mindset. Sounds like possibly someone in ops was thinking in terms of > pods being interchangeable, lightweight objects and not thinking about the > persistent volumes needing to line up and match the design the same way > every time. > > On topic: not sure, but one might need to set distrb=false or something > like that to avoid the routing. > > On Wed, May 24, 2023 at 12:49 PM Walter Underwood < >> wun...@wunderwood.org > > wrote: > >> Responses about how to avoid this are not on topic. I’ve had Solr in >> production since version 1.3 and I know the right way. >> >> I think I know how we got into this mess. The cluster is configured >> and >> deployed into Kubernetes. I think it was rebuilt with more shards then the >> existing storage volumes were mounted for the matching shards. New shards >> got empty volumes. Then the content was reloaded without a delete-all. >> >> Would it work to send the deletes directly to the leader for the >> shard? >> That might bypass the hash-based routing. >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >>> On May 24, 2023, at 8:35 AM, Walter Underwood >> >> wrote: >>> >>> Clearly, they are not broadcast, or if they are, they are filtered by >> the hash range before executing. If they were broadcast, this problem would >> not have happened. >>> >>> Yes, we’ll delete-all and reindex at some point. This collection has 1.7 >> billion documents across 96 shards, so a full reindex is not an >> everyday >> occurrence. I’m trying to clean up the minor problem of 675k documents with >> dupes. >>> >>> wunder >>> Walter Underwood >>> wun...@wunderwood.org >>> http://observer.wunderwood.org/ (my blog) >>> On May 24, 2023, at 8:06 AM, Jan Høydahl wrote: I thought deletes were "broadcast" but probably for the composite-id >> router it is not since we know for sure where it resides. You say "shards were added" - how did you do that? Sounds like you shold simply re-create your coll
Re: Deleting document on wrong shard?
On 5/24/23 10:48, Walter Underwood wrote: I think I know how we got into this mess. The cluster is configured and deployed into Kubernetes. I think it was rebuilt with more shards then the existing storage volumes were mounted for the matching shards. New shards got empty volumes. Then the content was reloaded without a delete-all. You're probably aware... that approach to re-sharding just plain will not work. Increasing or decreasing the shard count of a compositeid-routed collection requires re-indexing from scratch. The only way to add shards to an existing collection is to use SPLITSHARD, unless it's using the implicit router. I've seen discussion of a rebalance API, but no implementation. It would not be easy to implement. I have thought of one approach that might make it doable ... but it might not be possible to send any updates to the collection until the entire rebalance is complete. Assuming it's even possible, the approach I thought of would require a LOT of extra disk space, a lot of extra bandwidth usage, and would take much longer to run than an optimize. It might even take longer than doing a full re-index from the source system. Thanks, Shawn
Re: Deleting document on wrong shard?
Yes, I know it doesn’t work. It creates an index that violates some basic invariants, like having one ID map to one document. It does weird things, like return one document but list two documents in the facet counts with different values for the same single-valued field. I’m trying to patch it back into a consistent state while we wait for the next full reindex. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 24, 2023, at 1:44 PM, Shawn Heisey wrote: > > On 5/24/23 10:48, Walter Underwood wrote: >> I think I know how we got into this mess. The cluster is configured and >> deployed into Kubernetes. I think it was rebuilt with more shards then the >> existing storage volumes were mounted for the matching shards. New shards >> got empty volumes. Then the content was reloaded without a delete-all. > > You're probably aware... that approach to re-sharding just plain will not > work. Increasing or decreasing the shard count of a compositeid-routed > collection requires re-indexing from scratch. The only way to add shards to > an existing collection is to use SPLITSHARD, unless it's using the implicit > router. > > I've seen discussion of a rebalance API, but no implementation. It would not > be easy to implement. I have thought of one approach that might make it > doable ... but it might not be possible to send any updates to the collection > until the entire rebalance is complete. Assuming it's even possible, the > approach I thought of would require a LOT of extra disk space, a lot of extra > bandwidth usage, and would take much longer to run than an optimize. It > might even take longer than doing a full re-index from the source system. > > Thanks, > Shawn