On 5 Dec 2016, at 12:56, John Dickinson wrote:
> On 5 Dec 2016, at 12:39, Mark Kirkwood wrote: > >> Thanks John - increasing the partition coverage is a great idea (I hadn't >> considered doing that). >> >> >> Now with respect to the lack of durability implication - I should have said >> we are using a 2 region topology with (region) affinity set, so altho the >> initial object placement will be generally durable, in the case where >> someone was (perhaps overly) concerned about it living in both regions we >> were looking at how to 'encourage' that to happen (ahead of the steady >> process that the replicator is making). >> > > In a multi-region setup, I really like read-affinity, and I'm not a huge fan > of write-affinity. Read-affinity gives you a lot of what you're looking for: > you get responses from the "closest" servers that have the data, lowering > time-to-first-byte on reads. Write-affinity intentionally forces Swift to > place data, durably, in handoff locations, thus creating replication work > that happens later. Obviously, this means that if your inbound traffic to one > region is higher than the cross-region background replication, your cluster > will never "catch up" on that deferred work. In other words, you're > intentionally expanding and exposing an eventual consistency window to > clients that they are able to detect on reads. Some of this can be hidden by > also having read affinity that matches the write affinity setting, but it's > not possible to hide all of the behavior changes that the end-user will see. > > Write-affinity can be great if you've got a bursty workload (i.e. periods of > lower activity when the cluster catches up) or if the use case doesn't have > much in the way of reading the data immediately after writing it. So I'm not > totally against it. I just don't think in most cases the benefits are worth > the cost, and often times it can lead to confusion. > > However, all that being said, you're hinting at something that would be a > pretty cool feature: policy-specific affinity. That's a great idea! Now it's > just a matter of prioritization... Actually, it has been written. Just needs to be reviewed/landed. https://review.openstack.org/#/c/382766/ > > > --John > > > > > >> >> regards >> >> >> Mark >> >> >> On 06/12/16 05:41, John Dickinson wrote: >>> I'd suggest monitoring overall replications status with a combination of >>> log monitoring and swift-dispersion-report. If you find something that is >>> under-replicated, you can run the replicator process and give it a list of >>> partitions to prioritize. >>> http://docs.openstack.org/developer/swift/admin_guide.html#dispersion-report >>> I'd recommend running dispersion report with 100% coverage. >>> >>> However, your question implies that when an object is created it is not >>> durable in the system. That's not the case. Swift will not return with a >>> success unless the data has been durably persisted in the cluster (a quorum >>> of writes). Quorum is determined per storage policy; in a replicated >>> policy, quorum is half + 1 for odd numbers of replicas and half for even >>> numbers of replicas. This means that when a client gets a 201 response to >>> an object PUT request, that object has been stored at least 2 times in a >>> 3-replica storage policy (and it's extraordinarily likely it was stored a >>> full 3 times). >>> >>> Let me know if you want more details about this. >>> >>> --John >>> >>> >>> >>> >>> On 4 Dec 2016, at 14:56, Mark Kirkwood wrote: >>> >>>> Suppose you have a newly created object, and want to ensure it has >>>> replicated (i.e this object is 'special' in some way). If the replication >>>> process is taking a while, is it sensible to 'encourage' Swift replication >>>> via direct HTTP calls using the 'replicate' method (or is this dangerous)? >>>> >>>> >>>> E.g: >>>> >>>> markir@obj4:/srv/node/vdb/objects/5$ ls 55e >>>> ls: cannot access '55e': No such file or directory >>>> >>>> markir@obj1:~$ curl -v -X REPLICATE "http://obj4:6000/vdb/5/55e" >>>> >>>> markir@obj4:/srv/node/vdb/objects/5$ sleep 30;ls 55e >>>> 5c38bfdd63f01a8e56260105fc68555e >>>> >>>> >>>> regards >>>> >>>> >>>> Mark >>>> >>>> >>>> >>>> _______________________________________________ >>>> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >>>> Post to : openstack@lists.openstack.org >>>> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > _______________________________________________ > Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > Post to : openstack@lists.openstack.org > Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack