It works! Thanks so much.

I’m using XML update format because the JSON format for sending multiple IDs 
for deletion is not documented anywhere I could find. It was easier to just 
generate XML instead of continuing to search for documentation.

This does the trick:

<delete><id 
route="shard25_2">datalake_FPD_163298_3RGR-V090-006H-40F0-00000-00</id></delete>

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On May 24, 2023, at 1:13 PM, Ishan Chattopadhyaya <ichattopadhy...@gmail.com> 
> wrote:
> 
> Ah, now I remember this comment:
> https://issues.apache.org/jira/browse/SOLR-5890?focusedCommentId=14294129&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14294129
> 
> "Updated the patch, now with the Hash based router also honouring the _
> *route*_ param."
> 
> On Thu, 25 May 2023 at 01:38, Walter Underwood <wun...@wunderwood.org>
> wrote:
> 
>> Nice catch. This issue looks exactly like what I’m seeing, it returns
>> success but does not delete the document.
>> 
>> SOLR-5890
>> Delete silently fails if not sent to shard where document was added
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On May 24, 2023, at 12:21 PM, Ishan Chattopadhyaya <
>> ichattopadhy...@gmail.com> wrote:
>>> 
>>> Would specifying a _route_ parameter in the request work?
>>> https://issues.apache.org/jira/browse/SOLR-6910
>>> I know your case is not implicit router based, but just wondering if it
>>> still works somehow?
>>> 
>>> 
>>> On Wed, 24 May 2023 at 23:28, Walter Underwood <wun...@wunderwood.org>
>>> wrote:
>>> 
>>>> Ooh, going directly to the leader node and using distrib=false, I like
>>>> that idea. Now I need to figure out how to directly hit the danged
>>>> Kubernetes pods.
>>>> 
>>>> The config/deploy design here is pretty solid and aware of persistent
>>>> storage volumes. It works fine for increasing replicas. We just need to
>>>> avoid changing the number of shards without a reindex. One of the other
>>>> clusters has 320 shards.
>>>> 
>>>> wunder
>>>> Walter Underwood
>>>> wun...@wunderwood.org
>>>> http://observer.wunderwood.org/  (my blog)
>>>> 
>>>>> On May 24, 2023, at 10:12 AM, Gus Heck <gus.h...@gmail.com> wrote:
>>>>> 
>>>>> Understood, of course I've seen your name on the list for a long time.
>>>>> Partly my response is for the benefit of readers too, sorry if that
>>>>> bothered you. You of course may have good reasons, and carefully
>> refined
>>>> a
>>>>> design for your situation, that might not be best emulated everywhere.
>>>>> Living in Kube is tricky partly because (as I understand it) it was
>>>>> designed with stateless web stuff and microservices in mind I think and
>>>>> it's really easy for folks administering to trip on googled advice that
>>>> has
>>>>> that mindset. Sounds like possibly someone in ops was thinking in terms
>>>> of
>>>>> pods being interchangeable, lightweight objects and not thinking about
>>>> the
>>>>> persistent volumes needing to line up and match the design the same way
>>>>> every time.
>>>>> 
>>>>> On topic: not sure, but one might need to set distrb=false or something
>>>>> like that to avoid the routing.
>>>>> 
>>>>> On Wed, May 24, 2023 at 12:49 PM Walter Underwood <
>> wun...@wunderwood.org
>>>>> 
>>>>> wrote:
>>>>> 
>>>>>> Responses about how to avoid this are not on topic. I’ve had Solr in
>>>>>> production since version 1.3 and I know the right way.
>>>>>> 
>>>>>> I think I know how we got into this mess. The cluster is configured
>> and
>>>>>> deployed into Kubernetes. I think it was rebuilt with more shards then
>>>> the
>>>>>> existing storage volumes were mounted for the matching shards. New
>>>> shards
>>>>>> got empty volumes. Then the content was reloaded without a delete-all.
>>>>>> 
>>>>>> Would it work to send the deletes directly to the leader for the
>> shard?
>>>>>> That might bypass the hash-based routing.
>>>>>> 
>>>>>> wunder
>>>>>> Walter Underwood
>>>>>> wun...@wunderwood.org
>>>>>> http://observer.wunderwood.org/  (my blog)
>>>>>> 
>>>>>>> On May 24, 2023, at 8:35 AM, Walter Underwood <wun...@wunderwood.org
>>> 
>>>>>> wrote:
>>>>>>> 
>>>>>>> Clearly, they are not broadcast, or if they are, they are filtered by
>>>>>> the hash range before executing. If they were broadcast, this problem
>>>> would
>>>>>> not have happened.
>>>>>>> 
>>>>>>> Yes, we’ll delete-all and reindex at some point. This collection has
>>>> 1.7
>>>>>> billion documents across 96 shards, so a full reindex is not an
>> everyday
>>>>>> occurrence. I’m trying to clean up the minor problem of 675k documents
>>>> with
>>>>>> dupes.
>>>>>>> 
>>>>>>> wunder
>>>>>>> Walter Underwood
>>>>>>> wun...@wunderwood.org
>>>>>>> http://observer.wunderwood.org/  (my blog)
>>>>>>> 
>>>>>>>> On May 24, 2023, at 8:06 AM, Jan Høydahl <jan....@cominvent.com>
>>>> wrote:
>>>>>>>> 
>>>>>>>> I thought deletes were "broadcast" but probably for the composite-id
>>>>>> router it is not since we know for sure where it resides.
>>>>>>>> You say "shards were added" - how did you do that?
>>>>>>>> Sounds like you shold simply re-create your collection and re-index?
>>>>>>>> 
>>>>>>>> Jan
>>>>>>>> 
>>>>>>>>> 24. mai 2023 kl. 16:39 skrev Walter Underwood <
>> wun...@wunderwood.org
>>>>> :
>>>>>>>>> 
>>>>>>>>> We have a messed-up index with documents on shards where they
>>>>>> shouldn’t be. Content was indexed, shards were added, then everything
>>>> was
>>>>>> reindexed. So the new document with the same ID was put on a new
>> shard,
>>>>>> leaving the previous version on the old shard (where it doesn’t match
>>>> the
>>>>>> hash range).
>>>>>>>>> 
>>>>>>>>> I’m trying to delete the old document by sending an update with
>>>>>> delete-by-id and a shards parameter. It returns success, but the
>>>> document
>>>>>> isn’t deleted.
>>>>>>>>> 
>>>>>>>>> Is the hash range being checked and overriding the shards param
>>>>>> somehow? Any ideas on how to make this work?
>>>>>>>>> 
>>>>>>>>> And yes, we won’t do that again.
>>>>>>>>> 
>>>>>>>>> wunder
>>>>>>>>> Walter Underwood
>>>>>>>>> wun...@wunderwood.org
>>>>>>>>> http://observer.wunderwood.org/  (my blog)
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> --
>>>>> http://www.needhamsoftware.com (work)
>>>>> http://www.the111shift.com (play)
>>>> 
>>>> 
>> 
>> 

Reply via email to