Re: Help needed testing new systemd script (SOLR-14410)

2023-05-24 Thread Shawn Heisey

On 5/23/2023 3:12 AM, Jan Høydahl wrote:

We have an excellent contribution in 
https://issues.apache.org/jira/browse/SOLR-14410 and  
https://github.com/apache/solr/pull/428 to switch to systemd init script for 
Solr.

Before we can merge we need help testing it on more Unix flavours. Do you have 
access to RHEL, CentOS or OpenSuse and want to help testing then give us a ping 
in the JIRA issue or PR.


I think I wrote this on the issue:

I have an infrastructure where I can easily create VMs to try this out 
on many different operating systems.  Point me at an installer for an OS 
you'd like to try it on and I can give it a whirl.  If I can figure out 
the installer!


A systemd service would make installation a LOT more bulletproof.  Seems 
like at least once a month I learn about a new thing that systemd has 
built in that used to require a dedicated service.


Thanks,
Shawn


Deleting document on wrong shard?

2023-05-24 Thread Walter Underwood
We have a messed-up index with documents on shards where they shouldn’t be. 
Content was indexed, shards were added, then everything was reindexed. So the 
new document with the same ID was put on a new shard, leaving the previous 
version on the old shard (where it doesn’t match the hash range).

I’m trying to delete the old document by sending an update with delete-by-id 
and a shards parameter. It returns success, but the document isn’t deleted.

Is the hash range being checked and overriding the shards param somehow? Any 
ideas on how to make this work?

And yes, we won’t do that again.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



Re: Deleting document on wrong shard?

2023-05-24 Thread Jan Høydahl
I thought deletes were "broadcast" but probably for the composite-id router it 
is not since we know for sure where it resides.
You say "shards were added" - how did you do that?
Sounds like you shold simply re-create your collection and re-index?

Jan

> 24. mai 2023 kl. 16:39 skrev Walter Underwood :
> 
> We have a messed-up index with documents on shards where they shouldn’t be. 
> Content was indexed, shards were added, then everything was reindexed. So the 
> new document with the same ID was put on a new shard, leaving the previous 
> version on the old shard (where it doesn’t match the hash range).
> 
> I’m trying to delete the old document by sending an update with delete-by-id 
> and a shards parameter. It returns success, but the document isn’t deleted.
> 
> Is the hash range being checked and overriding the shards param somehow? Any 
> ideas on how to make this work?
> 
> And yes, we won’t do that again.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 



Re: Deleting document on wrong shard?

2023-05-24 Thread Walter Underwood
Clearly, they are not broadcast, or if they are, they are filtered by the hash 
range before executing. If they were broadcast, this problem would not have 
happened.

Yes, we’ll delete-all and reindex at some point. This collection has 1.7 
billion documents across 96 shards, so a full reindex is not an everyday 
occurrence. I’m trying to clean up the minor problem of 675k documents with 
dupes.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On May 24, 2023, at 8:06 AM, Jan Høydahl  wrote:
> 
> I thought deletes were "broadcast" but probably for the composite-id router 
> it is not since we know for sure where it resides.
> You say "shards were added" - how did you do that?
> Sounds like you shold simply re-create your collection and re-index?
> 
> Jan
> 
>> 24. mai 2023 kl. 16:39 skrev Walter Underwood :
>> 
>> We have a messed-up index with documents on shards where they shouldn’t be. 
>> Content was indexed, shards were added, then everything was reindexed. So 
>> the new document with the same ID was put on a new shard, leaving the 
>> previous version on the old shard (where it doesn’t match the hash range).
>> 
>> I’m trying to delete the old document by sending an update with delete-by-id 
>> and a shards parameter. It returns success, but the document isn’t deleted.
>> 
>> Is the hash range being checked and overriding the shards param somehow? Any 
>> ideas on how to make this work?
>> 
>> And yes, we won’t do that again.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
> 



Re: Deleting document on wrong shard?

2023-05-24 Thread Gus Heck
Often it's a better idea to index into a fresh collection when making
changes that imply a full re-index. If you use an alias, the swap out of
the old collection is atomic when you update the alias, requiring no front
end changes at all (and swap back is easy if things aren't what you
expected). Of course if you're not running cloud the same applies to
creating a new index, but aliases won't be available, so then hopefully
you've built some proxy layer or application API that has a simple
configuration to update where it queries...

If you're going to spend the effort to reprocess all the docs, why not end
up in a state with no deleted docs? (i.e. don't do delete-all, create a new
collection)



On Wed, May 24, 2023 at 11:08 AM Jan Høydahl  wrote:

> I thought deletes were "broadcast" but probably for the composite-id
> router it is not since we know for sure where it resides.
> You say "shards were added" - how did you do that?
> Sounds like you shold simply re-create your collection and re-index?
>
> Jan
>
> > 24. mai 2023 kl. 16:39 skrev Walter Underwood :
> >
> > We have a messed-up index with documents on shards where they shouldn’t
> be. Content was indexed, shards were added, then everything was reindexed.
> So the new document with the same ID was put on a new shard, leaving the
> previous version on the old shard (where it doesn’t match the hash range).
> >
> > I’m trying to delete the old document by sending an update with
> delete-by-id and a shards parameter. It returns success, but the document
> isn’t deleted.
> >
> > Is the hash range being checked and overriding the shards param somehow?
> Any ideas on how to make this work?
> >
> > And yes, we won’t do that again.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Deleting document on wrong shard?

2023-05-24 Thread Walter Underwood
Responses about how to avoid this are not on topic. I’ve had Solr in production 
since version 1.3 and I know the right way.

I think I know how we got into this mess. The cluster is configured and 
deployed into Kubernetes. I think it was rebuilt with more shards then the 
existing storage volumes were mounted for the matching shards. New shards got 
empty volumes. Then the content was reloaded without a delete-all.

Would it work to send the deletes directly to the leader for the shard? That 
might bypass the hash-based routing.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On May 24, 2023, at 8:35 AM, Walter Underwood  wrote:
> 
> Clearly, they are not broadcast, or if they are, they are filtered by the 
> hash range before executing. If they were broadcast, this problem would not 
> have happened.
> 
> Yes, we’ll delete-all and reindex at some point. This collection has 1.7 
> billion documents across 96 shards, so a full reindex is not an everyday 
> occurrence. I’m trying to clean up the minor problem of 675k documents with 
> dupes.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On May 24, 2023, at 8:06 AM, Jan Høydahl  wrote:
>> 
>> I thought deletes were "broadcast" but probably for the composite-id router 
>> it is not since we know for sure where it resides.
>> You say "shards were added" - how did you do that?
>> Sounds like you shold simply re-create your collection and re-index?
>> 
>> Jan
>> 
>>> 24. mai 2023 kl. 16:39 skrev Walter Underwood :
>>> 
>>> We have a messed-up index with documents on shards where they shouldn’t be. 
>>> Content was indexed, shards were added, then everything was reindexed. So 
>>> the new document with the same ID was put on a new shard, leaving the 
>>> previous version on the old shard (where it doesn’t match the hash range).
>>> 
>>> I’m trying to delete the old document by sending an update with 
>>> delete-by-id and a shards parameter. It returns success, but the document 
>>> isn’t deleted.
>>> 
>>> Is the hash range being checked and overriding the shards param somehow? 
>>> Any ideas on how to make this work?
>>> 
>>> And yes, we won’t do that again.
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
>> 
> 



Re: Deleting document on wrong shard?

2023-05-24 Thread Gus Heck
Understood, of course I've seen your name on the list for a long time.
Partly my response is for the benefit of readers too, sorry if that
bothered you. You of course may have good reasons, and carefully refined a
design for your situation, that might not be best emulated everywhere.
Living in Kube is tricky partly because (as I understand it) it was
designed with stateless web stuff and microservices in mind I think and
it's really easy for folks administering to trip on googled advice that has
that mindset. Sounds like possibly someone in ops was thinking in terms of
pods being interchangeable, lightweight objects and not thinking about the
persistent volumes needing to line up and match the design the same way
every time.

On topic: not sure, but one might need to set distrb=false or something
like that to avoid the routing.

On Wed, May 24, 2023 at 12:49 PM Walter Underwood 
wrote:

> Responses about how to avoid this are not on topic. I’ve had Solr in
> production since version 1.3 and I know the right way.
>
> I think I know how we got into this mess. The cluster is configured and
> deployed into Kubernetes. I think it was rebuilt with more shards then the
> existing storage volumes were mounted for the matching shards. New shards
> got empty volumes. Then the content was reloaded without a delete-all.
>
> Would it work to send the deletes directly to the leader for the shard?
> That might bypass the hash-based routing.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On May 24, 2023, at 8:35 AM, Walter Underwood 
> wrote:
> >
> > Clearly, they are not broadcast, or if they are, they are filtered by
> the hash range before executing. If they were broadcast, this problem would
> not have happened.
> >
> > Yes, we’ll delete-all and reindex at some point. This collection has 1.7
> billion documents across 96 shards, so a full reindex is not an everyday
> occurrence. I’m trying to clean up the minor problem of 675k documents with
> dupes.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >> On May 24, 2023, at 8:06 AM, Jan Høydahl  wrote:
> >>
> >> I thought deletes were "broadcast" but probably for the composite-id
> router it is not since we know for sure where it resides.
> >> You say "shards were added" - how did you do that?
> >> Sounds like you shold simply re-create your collection and re-index?
> >>
> >> Jan
> >>
> >>> 24. mai 2023 kl. 16:39 skrev Walter Underwood :
> >>>
> >>> We have a messed-up index with documents on shards where they
> shouldn’t be. Content was indexed, shards were added, then everything was
> reindexed. So the new document with the same ID was put on a new shard,
> leaving the previous version on the old shard (where it doesn’t match the
> hash range).
> >>>
> >>> I’m trying to delete the old document by sending an update with
> delete-by-id and a shards parameter. It returns success, but the document
> isn’t deleted.
> >>>
> >>> Is the hash range being checked and overriding the shards param
> somehow? Any ideas on how to make this work?
> >>>
> >>> And yes, we won’t do that again.
> >>>
> >>> wunder
> >>> Walter Underwood
> >>> wun...@wunderwood.org
> >>> http://observer.wunderwood.org/  (my blog)
> >>>
> >>
> >
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Deleting document on wrong shard?

2023-05-24 Thread Walter Underwood
Ooh, going directly to the leader node and using distrib=false, I like that 
idea. Now I need to figure out how to directly hit the danged Kubernetes pods.

The config/deploy design here is pretty solid and aware of persistent storage 
volumes. It works fine for increasing replicas. We just need to avoid changing 
the number of shards without a reindex. One of the other clusters has 320 
shards.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On May 24, 2023, at 10:12 AM, Gus Heck  wrote:
> 
> Understood, of course I've seen your name on the list for a long time.
> Partly my response is for the benefit of readers too, sorry if that
> bothered you. You of course may have good reasons, and carefully refined a
> design for your situation, that might not be best emulated everywhere.
> Living in Kube is tricky partly because (as I understand it) it was
> designed with stateless web stuff and microservices in mind I think and
> it's really easy for folks administering to trip on googled advice that has
> that mindset. Sounds like possibly someone in ops was thinking in terms of
> pods being interchangeable, lightweight objects and not thinking about the
> persistent volumes needing to line up and match the design the same way
> every time.
> 
> On topic: not sure, but one might need to set distrb=false or something
> like that to avoid the routing.
> 
> On Wed, May 24, 2023 at 12:49 PM Walter Underwood 
> wrote:
> 
>> Responses about how to avoid this are not on topic. I’ve had Solr in
>> production since version 1.3 and I know the right way.
>> 
>> I think I know how we got into this mess. The cluster is configured and
>> deployed into Kubernetes. I think it was rebuilt with more shards then the
>> existing storage volumes were mounted for the matching shards. New shards
>> got empty volumes. Then the content was reloaded without a delete-all.
>> 
>> Would it work to send the deletes directly to the leader for the shard?
>> That might bypass the hash-based routing.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On May 24, 2023, at 8:35 AM, Walter Underwood 
>> wrote:
>>> 
>>> Clearly, they are not broadcast, or if they are, they are filtered by
>> the hash range before executing. If they were broadcast, this problem would
>> not have happened.
>>> 
>>> Yes, we’ll delete-all and reindex at some point. This collection has 1.7
>> billion documents across 96 shards, so a full reindex is not an everyday
>> occurrence. I’m trying to clean up the minor problem of 675k documents with
>> dupes.
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
 On May 24, 2023, at 8:06 AM, Jan Høydahl  wrote:
 
 I thought deletes were "broadcast" but probably for the composite-id
>> router it is not since we know for sure where it resides.
 You say "shards were added" - how did you do that?
 Sounds like you shold simply re-create your collection and re-index?
 
 Jan
 
> 24. mai 2023 kl. 16:39 skrev Walter Underwood :
> 
> We have a messed-up index with documents on shards where they
>> shouldn’t be. Content was indexed, shards were added, then everything was
>> reindexed. So the new document with the same ID was put on a new shard,
>> leaving the previous version on the old shard (where it doesn’t match the
>> hash range).
> 
> I’m trying to delete the old document by sending an update with
>> delete-by-id and a shards parameter. It returns success, but the document
>> isn’t deleted.
> 
> Is the hash range being checked and overriding the shards param
>> somehow? Any ideas on how to make this work?
> 
> And yes, we won’t do that again.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
 
>>> 
>> 
>> 
> 
> -- 
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)



Re: Deleting document on wrong shard?

2023-05-24 Thread Ishan Chattopadhyaya
Would specifying a _route_ parameter in the request work?
https://issues.apache.org/jira/browse/SOLR-6910
I know your case is not implicit router based, but just wondering if it
still works somehow?


On Wed, 24 May 2023 at 23:28, Walter Underwood 
wrote:

> Ooh, going directly to the leader node and using distrib=false, I like
> that idea. Now I need to figure out how to directly hit the danged
> Kubernetes pods.
>
> The config/deploy design here is pretty solid and aware of persistent
> storage volumes. It works fine for increasing replicas. We just need to
> avoid changing the number of shards without a reindex. One of the other
> clusters has 320 shards.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On May 24, 2023, at 10:12 AM, Gus Heck  wrote:
> >
> > Understood, of course I've seen your name on the list for a long time.
> > Partly my response is for the benefit of readers too, sorry if that
> > bothered you. You of course may have good reasons, and carefully refined
> a
> > design for your situation, that might not be best emulated everywhere.
> > Living in Kube is tricky partly because (as I understand it) it was
> > designed with stateless web stuff and microservices in mind I think and
> > it's really easy for folks administering to trip on googled advice that
> has
> > that mindset. Sounds like possibly someone in ops was thinking in terms
> of
> > pods being interchangeable, lightweight objects and not thinking about
> the
> > persistent volumes needing to line up and match the design the same way
> > every time.
> >
> > On topic: not sure, but one might need to set distrb=false or something
> > like that to avoid the routing.
> >
> > On Wed, May 24, 2023 at 12:49 PM Walter Underwood  >
> > wrote:
> >
> >> Responses about how to avoid this are not on topic. I’ve had Solr in
> >> production since version 1.3 and I know the right way.
> >>
> >> I think I know how we got into this mess. The cluster is configured and
> >> deployed into Kubernetes. I think it was rebuilt with more shards then
> the
> >> existing storage volumes were mounted for the matching shards. New
> shards
> >> got empty volumes. Then the content was reloaded without a delete-all.
> >>
> >> Would it work to send the deletes directly to the leader for the shard?
> >> That might bypass the hash-based routing.
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >>> On May 24, 2023, at 8:35 AM, Walter Underwood 
> >> wrote:
> >>>
> >>> Clearly, they are not broadcast, or if they are, they are filtered by
> >> the hash range before executing. If they were broadcast, this problem
> would
> >> not have happened.
> >>>
> >>> Yes, we’ll delete-all and reindex at some point. This collection has
> 1.7
> >> billion documents across 96 shards, so a full reindex is not an everyday
> >> occurrence. I’m trying to clean up the minor problem of 675k documents
> with
> >> dupes.
> >>>
> >>> wunder
> >>> Walter Underwood
> >>> wun...@wunderwood.org
> >>> http://observer.wunderwood.org/  (my blog)
> >>>
>  On May 24, 2023, at 8:06 AM, Jan Høydahl 
> wrote:
> 
>  I thought deletes were "broadcast" but probably for the composite-id
> >> router it is not since we know for sure where it resides.
>  You say "shards were added" - how did you do that?
>  Sounds like you shold simply re-create your collection and re-index?
> 
>  Jan
> 
> > 24. mai 2023 kl. 16:39 skrev Walter Underwood  >:
> >
> > We have a messed-up index with documents on shards where they
> >> shouldn’t be. Content was indexed, shards were added, then everything
> was
> >> reindexed. So the new document with the same ID was put on a new shard,
> >> leaving the previous version on the old shard (where it doesn’t match
> the
> >> hash range).
> >
> > I’m trying to delete the old document by sending an update with
> >> delete-by-id and a shards parameter. It returns success, but the
> document
> >> isn’t deleted.
> >
> > Is the hash range being checked and overriding the shards param
> >> somehow? Any ideas on how to make this work?
> >
> > And yes, we won’t do that again.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> 
> >>>
> >>
> >>
> >
> > --
> > http://www.needhamsoftware.com (work)
> > http://www.the111shift.com (play)
>
>


Re: Deleting document on wrong shard?

2023-05-24 Thread Walter Underwood
Nice catch. This issue looks exactly like what I’m seeing, it returns success 
but does not delete the document.

SOLR-5890
Delete silently fails if not sent to shard where document was added

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On May 24, 2023, at 12:21 PM, Ishan Chattopadhyaya 
>  wrote:
> 
> Would specifying a _route_ parameter in the request work?
> https://issues.apache.org/jira/browse/SOLR-6910
> I know your case is not implicit router based, but just wondering if it
> still works somehow?
> 
> 
> On Wed, 24 May 2023 at 23:28, Walter Underwood 
> wrote:
> 
>> Ooh, going directly to the leader node and using distrib=false, I like
>> that idea. Now I need to figure out how to directly hit the danged
>> Kubernetes pods.
>> 
>> The config/deploy design here is pretty solid and aware of persistent
>> storage volumes. It works fine for increasing replicas. We just need to
>> avoid changing the number of shards without a reindex. One of the other
>> clusters has 320 shards.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On May 24, 2023, at 10:12 AM, Gus Heck  wrote:
>>> 
>>> Understood, of course I've seen your name on the list for a long time.
>>> Partly my response is for the benefit of readers too, sorry if that
>>> bothered you. You of course may have good reasons, and carefully refined
>> a
>>> design for your situation, that might not be best emulated everywhere.
>>> Living in Kube is tricky partly because (as I understand it) it was
>>> designed with stateless web stuff and microservices in mind I think and
>>> it's really easy for folks administering to trip on googled advice that
>> has
>>> that mindset. Sounds like possibly someone in ops was thinking in terms
>> of
>>> pods being interchangeable, lightweight objects and not thinking about
>> the
>>> persistent volumes needing to line up and match the design the same way
>>> every time.
>>> 
>>> On topic: not sure, but one might need to set distrb=false or something
>>> like that to avoid the routing.
>>> 
>>> On Wed, May 24, 2023 at 12:49 PM Walter Underwood >> 
>>> wrote:
>>> 
 Responses about how to avoid this are not on topic. I’ve had Solr in
 production since version 1.3 and I know the right way.
 
 I think I know how we got into this mess. The cluster is configured and
 deployed into Kubernetes. I think it was rebuilt with more shards then
>> the
 existing storage volumes were mounted for the matching shards. New
>> shards
 got empty volumes. Then the content was reloaded without a delete-all.
 
 Would it work to send the deletes directly to the leader for the shard?
 That might bypass the hash-based routing.
 
 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/  (my blog)
 
> On May 24, 2023, at 8:35 AM, Walter Underwood 
 wrote:
> 
> Clearly, they are not broadcast, or if they are, they are filtered by
 the hash range before executing. If they were broadcast, this problem
>> would
 not have happened.
> 
> Yes, we’ll delete-all and reindex at some point. This collection has
>> 1.7
 billion documents across 96 shards, so a full reindex is not an everyday
 occurrence. I’m trying to clean up the minor problem of 675k documents
>> with
 dupes.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On May 24, 2023, at 8:06 AM, Jan Høydahl 
>> wrote:
>> 
>> I thought deletes were "broadcast" but probably for the composite-id
 router it is not since we know for sure where it resides.
>> You say "shards were added" - how did you do that?
>> Sounds like you shold simply re-create your collection and re-index?
>> 
>> Jan
>> 
>>> 24. mai 2023 kl. 16:39 skrev Walter Underwood >> :
>>> 
>>> We have a messed-up index with documents on shards where they
 shouldn’t be. Content was indexed, shards were added, then everything
>> was
 reindexed. So the new document with the same ID was put on a new shard,
 leaving the previous version on the old shard (where it doesn’t match
>> the
 hash range).
>>> 
>>> I’m trying to delete the old document by sending an update with
 delete-by-id and a shards parameter. It returns success, but the
>> document
 isn’t deleted.
>>> 
>>> Is the hash range being checked and overriding the shards param
 somehow? Any ideas on how to make this work?
>>> 
>>> And yes, we won’t do that again.
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
>> 
> 
 
 
>>> 
>>> --
>>> http://www.needhamsoftware.com (work)
>>> http://www.the111shift.com (play)
>> 
>> 



Re: Deleting document on wrong shard?

2023-05-24 Thread Ishan Chattopadhyaya
Ah, now I remember this comment:
https://issues.apache.org/jira/browse/SOLR-5890?focusedCommentId=14294129&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14294129

"Updated the patch, now with the Hash based router also honouring the _
*route*_ param."

On Thu, 25 May 2023 at 01:38, Walter Underwood 
wrote:

> Nice catch. This issue looks exactly like what I’m seeing, it returns
> success but does not delete the document.
>
> SOLR-5890
> Delete silently fails if not sent to shard where document was added
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On May 24, 2023, at 12:21 PM, Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
> >
> > Would specifying a _route_ parameter in the request work?
> > https://issues.apache.org/jira/browse/SOLR-6910
> > I know your case is not implicit router based, but just wondering if it
> > still works somehow?
> >
> >
> > On Wed, 24 May 2023 at 23:28, Walter Underwood 
> > wrote:
> >
> >> Ooh, going directly to the leader node and using distrib=false, I like
> >> that idea. Now I need to figure out how to directly hit the danged
> >> Kubernetes pods.
> >>
> >> The config/deploy design here is pretty solid and aware of persistent
> >> storage volumes. It works fine for increasing replicas. We just need to
> >> avoid changing the number of shards without a reindex. One of the other
> >> clusters has 320 shards.
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >>> On May 24, 2023, at 10:12 AM, Gus Heck  wrote:
> >>>
> >>> Understood, of course I've seen your name on the list for a long time.
> >>> Partly my response is for the benefit of readers too, sorry if that
> >>> bothered you. You of course may have good reasons, and carefully
> refined
> >> a
> >>> design for your situation, that might not be best emulated everywhere.
> >>> Living in Kube is tricky partly because (as I understand it) it was
> >>> designed with stateless web stuff and microservices in mind I think and
> >>> it's really easy for folks administering to trip on googled advice that
> >> has
> >>> that mindset. Sounds like possibly someone in ops was thinking in terms
> >> of
> >>> pods being interchangeable, lightweight objects and not thinking about
> >> the
> >>> persistent volumes needing to line up and match the design the same way
> >>> every time.
> >>>
> >>> On topic: not sure, but one might need to set distrb=false or something
> >>> like that to avoid the routing.
> >>>
> >>> On Wed, May 24, 2023 at 12:49 PM Walter Underwood <
> wun...@wunderwood.org
> >>>
> >>> wrote:
> >>>
>  Responses about how to avoid this are not on topic. I’ve had Solr in
>  production since version 1.3 and I know the right way.
> 
>  I think I know how we got into this mess. The cluster is configured
> and
>  deployed into Kubernetes. I think it was rebuilt with more shards then
> >> the
>  existing storage volumes were mounted for the matching shards. New
> >> shards
>  got empty volumes. Then the content was reloaded without a delete-all.
> 
>  Would it work to send the deletes directly to the leader for the
> shard?
>  That might bypass the hash-based routing.
> 
>  wunder
>  Walter Underwood
>  wun...@wunderwood.org
>  http://observer.wunderwood.org/  (my blog)
> 
> > On May 24, 2023, at 8:35 AM, Walter Underwood  >
>  wrote:
> >
> > Clearly, they are not broadcast, or if they are, they are filtered by
>  the hash range before executing. If they were broadcast, this problem
> >> would
>  not have happened.
> >
> > Yes, we’ll delete-all and reindex at some point. This collection has
> >> 1.7
>  billion documents across 96 shards, so a full reindex is not an
> everyday
>  occurrence. I’m trying to clean up the minor problem of 675k documents
> >> with
>  dupes.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >> On May 24, 2023, at 8:06 AM, Jan Høydahl 
> >> wrote:
> >>
> >> I thought deletes were "broadcast" but probably for the composite-id
>  router it is not since we know for sure where it resides.
> >> You say "shards were added" - how did you do that?
> >> Sounds like you shold simply re-create your collection and re-index?
> >>
> >> Jan
> >>
> >>> 24. mai 2023 kl. 16:39 skrev Walter Underwood <
> wun...@wunderwood.org
> >>> :
> >>>
> >>> We have a messed-up index with documents on shards where they
>  shouldn’t be. Content was indexed, shards were added, then everything
> >> was
>  reindexed. So the new document with the same ID was put on a new
> shard,
>  leaving the previous version on the old shard (where it doesn’t match
> >> the
>  hash range).
> >>>
> >>> I’m trying to dele

Re: Deleting document on wrong shard?

2023-05-24 Thread Walter Underwood
It works! Thanks so much.

I’m using XML update format because the JSON format for sending multiple IDs 
for deletion is not documented anywhere I could find. It was easier to just 
generate XML instead of continuing to search for documentation.

This does the trick:

datalake_FPD_163298_3RGR-V090-006H-40F0-0-00

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On May 24, 2023, at 1:13 PM, Ishan Chattopadhyaya  
> wrote:
> 
> Ah, now I remember this comment:
> https://issues.apache.org/jira/browse/SOLR-5890?focusedCommentId=14294129&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14294129
> 
> "Updated the patch, now with the Hash based router also honouring the _
> *route*_ param."
> 
> On Thu, 25 May 2023 at 01:38, Walter Underwood 
> wrote:
> 
>> Nice catch. This issue looks exactly like what I’m seeing, it returns
>> success but does not delete the document.
>> 
>> SOLR-5890
>> Delete silently fails if not sent to shard where document was added
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On May 24, 2023, at 12:21 PM, Ishan Chattopadhyaya <
>> ichattopadhy...@gmail.com> wrote:
>>> 
>>> Would specifying a _route_ parameter in the request work?
>>> https://issues.apache.org/jira/browse/SOLR-6910
>>> I know your case is not implicit router based, but just wondering if it
>>> still works somehow?
>>> 
>>> 
>>> On Wed, 24 May 2023 at 23:28, Walter Underwood 
>>> wrote:
>>> 
 Ooh, going directly to the leader node and using distrib=false, I like
 that idea. Now I need to figure out how to directly hit the danged
 Kubernetes pods.
 
 The config/deploy design here is pretty solid and aware of persistent
 storage volumes. It works fine for increasing replicas. We just need to
 avoid changing the number of shards without a reindex. One of the other
 clusters has 320 shards.
 
 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/  (my blog)
 
> On May 24, 2023, at 10:12 AM, Gus Heck  wrote:
> 
> Understood, of course I've seen your name on the list for a long time.
> Partly my response is for the benefit of readers too, sorry if that
> bothered you. You of course may have good reasons, and carefully
>> refined
 a
> design for your situation, that might not be best emulated everywhere.
> Living in Kube is tricky partly because (as I understand it) it was
> designed with stateless web stuff and microservices in mind I think and
> it's really easy for folks administering to trip on googled advice that
 has
> that mindset. Sounds like possibly someone in ops was thinking in terms
 of
> pods being interchangeable, lightweight objects and not thinking about
 the
> persistent volumes needing to line up and match the design the same way
> every time.
> 
> On topic: not sure, but one might need to set distrb=false or something
> like that to avoid the routing.
> 
> On Wed, May 24, 2023 at 12:49 PM Walter Underwood <
>> wun...@wunderwood.org
> 
> wrote:
> 
>> Responses about how to avoid this are not on topic. I’ve had Solr in
>> production since version 1.3 and I know the right way.
>> 
>> I think I know how we got into this mess. The cluster is configured
>> and
>> deployed into Kubernetes. I think it was rebuilt with more shards then
 the
>> existing storage volumes were mounted for the matching shards. New
 shards
>> got empty volumes. Then the content was reloaded without a delete-all.
>> 
>> Would it work to send the deletes directly to the leader for the
>> shard?
>> That might bypass the hash-based routing.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On May 24, 2023, at 8:35 AM, Walter Underwood >> 
>> wrote:
>>> 
>>> Clearly, they are not broadcast, or if they are, they are filtered by
>> the hash range before executing. If they were broadcast, this problem
 would
>> not have happened.
>>> 
>>> Yes, we’ll delete-all and reindex at some point. This collection has
 1.7
>> billion documents across 96 shards, so a full reindex is not an
>> everyday
>> occurrence. I’m trying to clean up the minor problem of 675k documents
 with
>> dupes.
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
 On May 24, 2023, at 8:06 AM, Jan Høydahl 
 wrote:
 
 I thought deletes were "broadcast" but probably for the composite-id
>> router it is not since we know for sure where it resides.
 You say "shards were added" - how did you do that?
 Sounds like you shold simply re-create your coll

Re: Deleting document on wrong shard?

2023-05-24 Thread Shawn Heisey

On 5/24/23 10:48, Walter Underwood wrote:

I think I know how we got into this mess. The cluster is configured and 
deployed into Kubernetes. I think it was rebuilt with more shards then the 
existing storage volumes were mounted for the matching shards. New shards got 
empty volumes. Then the content was reloaded without a delete-all.


You're probably aware... that approach to re-sharding just plain will 
not work.  Increasing or decreasing the shard count of a 
compositeid-routed collection requires re-indexing from scratch.  The 
only way to add shards to an existing collection is to use SPLITSHARD, 
unless it's using the implicit router.


I've seen discussion of a rebalance API, but no implementation.  It 
would not be easy to implement.  I have thought of one approach that 
might make it doable ... but it might not be possible to send any 
updates to the collection until the entire rebalance is complete. 
Assuming it's even possible, the approach I thought of would require a 
LOT of extra disk space, a lot of extra bandwidth usage, and would take 
much longer to run than an optimize.  It might even take longer than 
doing a full re-index from the source system.


Thanks,
Shawn


Re: Deleting document on wrong shard?

2023-05-24 Thread Walter Underwood
Yes, I know it doesn’t work. It creates an index that violates some basic 
invariants, like having one ID map to one document. It does weird things, like 
return one document but list two documents in the facet counts with different 
values for the same single-valued field.

I’m trying to patch it back into a consistent state while we wait for the next 
full reindex.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On May 24, 2023, at 1:44 PM, Shawn Heisey  wrote:
> 
> On 5/24/23 10:48, Walter Underwood wrote:
>> I think I know how we got into this mess. The cluster is configured and 
>> deployed into Kubernetes. I think it was rebuilt with more shards then the 
>> existing storage volumes were mounted for the matching shards. New shards 
>> got empty volumes. Then the content was reloaded without a delete-all.
> 
> You're probably aware... that approach to re-sharding just plain will not 
> work.  Increasing or decreasing the shard count of a compositeid-routed 
> collection requires re-indexing from scratch.  The only way to add shards to 
> an existing collection is to use SPLITSHARD, unless it's using the implicit 
> router.
> 
> I've seen discussion of a rebalance API, but no implementation.  It would not 
> be easy to implement.  I have thought of one approach that might make it 
> doable ... but it might not be possible to send any updates to the collection 
> until the entire rebalance is complete. Assuming it's even possible, the 
> approach I thought of would require a LOT of extra disk space, a lot of extra 
> bandwidth usage, and would take much longer to run than an optimize.  It 
> might even take longer than doing a full re-index from the source system.
> 
> Thanks,
> Shawn