It works! Thanks so much. I’m using XML update format because the JSON format for sending multiple IDs for deletion is not documented anywhere I could find. It was easier to just generate XML instead of continuing to search for documentation.
This does the trick: <delete><id route="shard25_2">datalake_FPD_163298_3RGR-V090-006H-40F0-00000-00</id></delete> wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 24, 2023, at 1:13 PM, Ishan Chattopadhyaya <ichattopadhy...@gmail.com> > wrote: > > Ah, now I remember this comment: > https://issues.apache.org/jira/browse/SOLR-5890?focusedCommentId=14294129&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14294129 > > "Updated the patch, now with the Hash based router also honouring the _ > *route*_ param." > > On Thu, 25 May 2023 at 01:38, Walter Underwood <wun...@wunderwood.org> > wrote: > >> Nice catch. This issue looks exactly like what I’m seeing, it returns >> success but does not delete the document. >> >> SOLR-5890 >> Delete silently fails if not sent to shard where document was added >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >>> On May 24, 2023, at 12:21 PM, Ishan Chattopadhyaya < >> ichattopadhy...@gmail.com> wrote: >>> >>> Would specifying a _route_ parameter in the request work? >>> https://issues.apache.org/jira/browse/SOLR-6910 >>> I know your case is not implicit router based, but just wondering if it >>> still works somehow? >>> >>> >>> On Wed, 24 May 2023 at 23:28, Walter Underwood <wun...@wunderwood.org> >>> wrote: >>> >>>> Ooh, going directly to the leader node and using distrib=false, I like >>>> that idea. Now I need to figure out how to directly hit the danged >>>> Kubernetes pods. >>>> >>>> The config/deploy design here is pretty solid and aware of persistent >>>> storage volumes. It works fine for increasing replicas. We just need to >>>> avoid changing the number of shards without a reindex. One of the other >>>> clusters has 320 shards. >>>> >>>> wunder >>>> Walter Underwood >>>> wun...@wunderwood.org >>>> http://observer.wunderwood.org/ (my blog) >>>> >>>>> On May 24, 2023, at 10:12 AM, Gus Heck <gus.h...@gmail.com> wrote: >>>>> >>>>> Understood, of course I've seen your name on the list for a long time. >>>>> Partly my response is for the benefit of readers too, sorry if that >>>>> bothered you. You of course may have good reasons, and carefully >> refined >>>> a >>>>> design for your situation, that might not be best emulated everywhere. >>>>> Living in Kube is tricky partly because (as I understand it) it was >>>>> designed with stateless web stuff and microservices in mind I think and >>>>> it's really easy for folks administering to trip on googled advice that >>>> has >>>>> that mindset. Sounds like possibly someone in ops was thinking in terms >>>> of >>>>> pods being interchangeable, lightweight objects and not thinking about >>>> the >>>>> persistent volumes needing to line up and match the design the same way >>>>> every time. >>>>> >>>>> On topic: not sure, but one might need to set distrb=false or something >>>>> like that to avoid the routing. >>>>> >>>>> On Wed, May 24, 2023 at 12:49 PM Walter Underwood < >> wun...@wunderwood.org >>>>> >>>>> wrote: >>>>> >>>>>> Responses about how to avoid this are not on topic. I’ve had Solr in >>>>>> production since version 1.3 and I know the right way. >>>>>> >>>>>> I think I know how we got into this mess. The cluster is configured >> and >>>>>> deployed into Kubernetes. I think it was rebuilt with more shards then >>>> the >>>>>> existing storage volumes were mounted for the matching shards. New >>>> shards >>>>>> got empty volumes. Then the content was reloaded without a delete-all. >>>>>> >>>>>> Would it work to send the deletes directly to the leader for the >> shard? >>>>>> That might bypass the hash-based routing. >>>>>> >>>>>> wunder >>>>>> Walter Underwood >>>>>> wun...@wunderwood.org >>>>>> http://observer.wunderwood.org/ (my blog) >>>>>> >>>>>>> On May 24, 2023, at 8:35 AM, Walter Underwood <wun...@wunderwood.org >>> >>>>>> wrote: >>>>>>> >>>>>>> Clearly, they are not broadcast, or if they are, they are filtered by >>>>>> the hash range before executing. If they were broadcast, this problem >>>> would >>>>>> not have happened. >>>>>>> >>>>>>> Yes, we’ll delete-all and reindex at some point. This collection has >>>> 1.7 >>>>>> billion documents across 96 shards, so a full reindex is not an >> everyday >>>>>> occurrence. I’m trying to clean up the minor problem of 675k documents >>>> with >>>>>> dupes. >>>>>>> >>>>>>> wunder >>>>>>> Walter Underwood >>>>>>> wun...@wunderwood.org >>>>>>> http://observer.wunderwood.org/ (my blog) >>>>>>> >>>>>>>> On May 24, 2023, at 8:06 AM, Jan Høydahl <jan....@cominvent.com> >>>> wrote: >>>>>>>> >>>>>>>> I thought deletes were "broadcast" but probably for the composite-id >>>>>> router it is not since we know for sure where it resides. >>>>>>>> You say "shards were added" - how did you do that? >>>>>>>> Sounds like you shold simply re-create your collection and re-index? >>>>>>>> >>>>>>>> Jan >>>>>>>> >>>>>>>>> 24. mai 2023 kl. 16:39 skrev Walter Underwood < >> wun...@wunderwood.org >>>>> : >>>>>>>>> >>>>>>>>> We have a messed-up index with documents on shards where they >>>>>> shouldn’t be. Content was indexed, shards were added, then everything >>>> was >>>>>> reindexed. So the new document with the same ID was put on a new >> shard, >>>>>> leaving the previous version on the old shard (where it doesn’t match >>>> the >>>>>> hash range). >>>>>>>>> >>>>>>>>> I’m trying to delete the old document by sending an update with >>>>>> delete-by-id and a shards parameter. It returns success, but the >>>> document >>>>>> isn’t deleted. >>>>>>>>> >>>>>>>>> Is the hash range being checked and overriding the shards param >>>>>> somehow? Any ideas on how to make this work? >>>>>>>>> >>>>>>>>> And yes, we won’t do that again. >>>>>>>>> >>>>>>>>> wunder >>>>>>>>> Walter Underwood >>>>>>>>> wun...@wunderwood.org >>>>>>>>> http://observer.wunderwood.org/ (my blog) >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> http://www.needhamsoftware.com (work) >>>>> http://www.the111shift.com (play) >>>> >>>> >> >>