On 5 Dec 2016, at 12:56, John Dickinson wrote:

> On 5 Dec 2016, at 12:39, Mark Kirkwood wrote:
>
>> Thanks John - increasing the partition coverage is a great idea (I hadn't 
>> considered doing that).
>>
>>
>> Now with respect to the lack of durability implication - I should have said 
>> we are using a 2 region topology with (region) affinity set, so altho the 
>> initial object placement will be generally durable, in the case where 
>> someone was (perhaps overly) concerned about it living in both regions we 
>> were looking at how to 'encourage' that to happen (ahead of the steady 
>> process that the replicator is making).
>>
>
> In a multi-region setup, I really like read-affinity, and I'm not a huge fan 
> of write-affinity. Read-affinity gives you a lot of what you're looking for: 
> you get responses from the "closest" servers that have the data, lowering 
> time-to-first-byte on reads. Write-affinity intentionally forces Swift to 
> place data, durably, in handoff locations, thus creating replication work 
> that happens later. Obviously, this means that if your inbound traffic to one 
> region is higher than the cross-region background replication, your cluster 
> will never "catch up" on that deferred work. In other words, you're 
> intentionally expanding and exposing an eventual consistency window to 
> clients that they are able to detect on reads. Some of this can be hidden by 
> also having read affinity that matches the write affinity setting, but it's 
> not possible to hide all of the behavior changes that the end-user will see.
>
> Write-affinity can be great if you've got a bursty workload (i.e. periods of 
> lower activity when the cluster catches up) or if the use case doesn't have 
> much in the way of reading the data immediately after writing it. So I'm not 
> totally against it. I just don't think in most cases the benefits are worth 
> the cost, and often times it can lead to confusion.
>
> However, all that being said, you're hinting at something that would be a 
> pretty cool feature: policy-specific affinity. That's a great idea! Now it's 
> just a matter of prioritization...


Actually, it has been written. Just needs to be reviewed/landed.

https://review.openstack.org/#/c/382766/


>
>
> --John
>
>
>
>
>
>>
>> regards
>>
>>
>> Mark
>>
>>
>> On 06/12/16 05:41, John Dickinson wrote:
>>> I'd suggest monitoring overall replications status with a combination of 
>>> log monitoring and swift-dispersion-report. If you find something that is 
>>> under-replicated, you can run the replicator process and give it a list of 
>>> partitions to prioritize. 
>>> http://docs.openstack.org/developer/swift/admin_guide.html#dispersion-report
>>>  I'd recommend running dispersion report with 100% coverage.
>>>
>>> However, your question implies that when an object is created it is not 
>>> durable in the system. That's not the case. Swift will not return with a 
>>> success unless the data has been durably persisted in the cluster (a quorum 
>>> of writes). Quorum is determined per storage policy; in a replicated 
>>> policy, quorum is half + 1 for odd numbers of replicas and half for even 
>>> numbers of replicas. This means that when a client gets a 201 response to 
>>> an object PUT request, that object has been stored at least 2 times in a 
>>> 3-replica storage policy (and it's extraordinarily likely it was stored a 
>>> full 3 times).
>>>
>>> Let me know if you want more details about this.
>>>
>>> --John
>>>
>>>
>>>
>>>
>>> On 4 Dec 2016, at 14:56, Mark Kirkwood wrote:
>>>
>>>> Suppose you have a newly created object, and want to ensure it has 
>>>> replicated (i.e this object is 'special' in some way). If the replication 
>>>> process is taking a while, is it sensible to 'encourage' Swift replication 
>>>> via direct HTTP calls using the 'replicate' method (or is this dangerous)?
>>>>
>>>>
>>>> E.g:
>>>>
>>>> markir@obj4:/srv/node/vdb/objects/5$ ls 55e
>>>> ls: cannot access '55e': No such file or directory
>>>>
>>>> markir@obj1:~$ curl -v -X REPLICATE "http://obj4:6000/vdb/5/55e";
>>>>
>>>> markir@obj4:/srv/node/vdb/objects/5$ sleep 30;ls 55e
>>>> 5c38bfdd63f01a8e56260105fc68555e
>>>>
>>>>
>>>> regards
>>>>
>>>>
>>>> Mark
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>> Post to     : openstack@lists.openstack.org
>>>> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack@lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Reply via email to