Re: Functionality of legacyCloud=false

Erick Erickson Sat, 14 Mar 2015 10:22:44 -0700

Right, it seems like the DELETEREPLICA could handle this case, I know
there have been some hardening done there lately but don't know if
it'd cover this case. Or maybe I'm thinking of deleting collections...


On Thu, Mar 12, 2015 at 10:26 AM, Varun Thacker
<[email protected]> wrote:
> bq. how is copying a core dir from one node to another a normal use case ?
>
> That was just for testing what happens.
>
> Okay here is a real world scenario -
>
> I create a collection.
> The collection fails to create since it had a bad config. The empty folders
> for the replicas gets left behind.
> Now I fix the config and issue a create again. The replicas get created but
> on different nodes on my cluster.
> In the future if I bounce the nodes which had the left over folders, they
> end up interfering with the healthy replicas for that collection.
>
> So apart from checking coreNodeName we should also check against baseUrl and
> make sure they are the same when legacyCloud=false. I will create a Jira for
> it.
>
> On Thu, Mar 12, 2015 at 9:52 PM, Noble Paul <[email protected]> wrote:
>>
>> bq.Or they're testing out restoring backups
>>
>> This is in the context of ZK as truth functionality. I guess , in that
>> case you expect those nodes to work exactly as the other replica
>>
>> On Thu, Mar 12, 2015 at 8:36 PM, Erick Erickson <[email protected]>
>> wrote:
>>>
>>> bq: how is copying a core dir from one node to another a normal use case
>>> ?
>>>
>>> A user is trying to move a replica from one place to another. While I
>>> agree they should use ADDREPLICA for the new one then DELTERPLICA on
>>> the old replica..
>>>
>>> Or they're testing out restoring backups.
>>>
>>> I've had clients do both of these things.
>>>
>>> On Thu, Mar 12, 2015 at 7:00 AM, Noble Paul <[email protected]> wrote:
>>> > how is copying a core dir from one node to another a normal use case ?
>>> >
>>> > On Mar 12, 2015 7:22 PM, "Varun Thacker" <[email protected]>
>>> > wrote:
>>> >>
>>> >> Hi Noble,
>>> >>
>>> >> Well I was just playing around to see if there were scenarios where
>>> >> different coreNodeNames could register themselves even if they weren't
>>> >> creating using the Collections API.
>>> >>
>>> >> So I was doing it intentionally here to see what happens. But I can
>>> >> totally imagine users running into the second scenario where an old
>>> >> node
>>> >> comes back up and ends up messing up that replica in the collection
>>> >> accidentally.
>>> >>
>>> >> On Thu, Mar 12, 2015 at 7:01 PM, Noble Paul <[email protected]>
>>> >> wrote:
>>> >>>
>>> >>> It is totally possible.
>>> >>> The point is , it was not a security feature and it is extremely easy
>>> >>> to
>>> >>> spoof it.
>>> >>> The question is , was it a normal scenario or was it an effort to
>>> >>> prove
>>> >>> that the system was not foolproof
>>> >>>
>>> >>> --Noble
>>> >>>
>>> >>> On Thu, Mar 12, 2015 at 6:23 PM, Varun Thacker
>>> >>> <[email protected]> wrote:
>>> >>>>
>>> >>>> Two scenarios I observed where we can bring up a replica even when I
>>> >>>> think it shouldn't. legacyCloud is set to false.
>>> >>>>
>>> >>>> I have two nodes A and B.
>>> >>>> CREATE collection 'test' with 1 shard, 1 replica. It gets created on
>>> >>>> node A.
>>> >>>> manually copy test_shard1_replica1 folder to node B's solr home.
>>> >>>> Bring down node A.
>>> >>>> Restart node B. The shard comes up registering itself on node B and
>>> >>>> becomes 'active'
>>> >>>>
>>> >>>> I have two nodes A and B ( this is down currently ).
>>> >>>> CREATE collection 'test' with 1 shard, 1 replica. It gets created on
>>> >>>> node A.
>>> >>>> manually copy test_shard1_replica1 folder to node B's solr home.
>>> >>>> Start node B. The shard comes up registering itself on node B and
>>> >>>> stays
>>> >>>> 'down'. The reason being the leader is still node A but clusterstate
>>> >>>> has
>>> >>>> base_url of Node B. This is the error in the logs - "Error getting
>>> >>>> leader
>>> >>>> from zk for shard shard1
>>> >>>>
>>> >>>> In legacyCloud=false you get a 'no_such_replica in clusterstate'
>>> >>>> error
>>> >>>> if the 'coreNodeName' is not present in clusterstate.
>>> >>>>
>>> >>>> But in my two observations the 'coreNodeName' were the same, hence I
>>> >>>> ran
>>> >>>> into that scenario.
>>> >>>>
>>> >>>> Should we make the check more stringent to not allow this to happen?
>>> >>>> Check against base_url also?
>>> >>>>
>>> >>>> Also should we be making legacyCloud=false as default in 5.x?
>>> >>>> --
>>> >>>>
>>> >>>>
>>> >>>> Regards,
>>> >>>> Varun Thacker
>>> >>>> http://www.vthacker.in/
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> -----------------------------------------------------
>>> >>> Noble Paul
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >>
>>> >>
>>> >> Regards,
>>> >> Varun Thacker
>>> >> http://www.vthacker.in/
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>
>>
>>
>> --
>> -----------------------------------------------------
>> Noble Paul
>
>
>
>
> --
>
>
> Regards,
> Varun Thacker
> http://www.vthacker.in/

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Functionality of legacyCloud=false

Reply via email to