Hi Chuck, Thanks for the detailed explanation! That pretty much answers all of my questions. I think this can (and should) placed as-is somewhere in the Swift Dokumentation and/or the Wiki.
Best, Nikolaus On 01/20/2012 04:58 PM, Chuck Thier wrote: > Some general notes for consistency and swift (all of the below assumes > 3 replicas): > > Objects: > > When swift PUTs an object, it attempts to write to all 3 replicas > and only returns success if 2 or more replicas were written > successfully. When a new object is created, it has a fairly strong > consistency for read after create. The only case this would not be > true, is if all of the devices that hold the object are not available. > When an object is PUT on top of another object, then there is more > eventual consistency that can come in to play for failure scenarios. > This is very similar to S3's consistency model. It is also important > to note that in the case of failure, and a device is not available for > a new replica to be written to, it will attempt to write the replica > to a handoff node. > > When swift GETs an object, by default it will return the first > object it finds from any available replicas. Using the X-Newest > header will require swift to compare the times tamps and only serve a > replica that has the most recent time stamp. If only one replica is > available with an older version of the object, it will be returned, > but in practice this would be quite an edge case. > > Container Listings: > > When an object is PUT in to swift, each object server that a replica > is written to is also assigned one of the containers servers to > update. On the object server, after the replica is successfully > written, an attempt will be made to update the listing of its assigned > container server. If that update fails, it is queued locally (which > is called an async pending), to be updated out of band by another > process. The container updater process continually looks for these > async pendings and will attempt to make the update, and will remove it > from the queue when successful. There are many reasons that a > container update can fail (failed device, timeout, heavily used > container, etc.). Thus container listings are eventually consistent > in all cases (which is also very similar to S3). > > Consistency Window: > > For objects, the biggest factor that determines the consistency window > is object replication time. In general this is pretty quick for even > large clusters, and we are always working on making this better. If > you want to limit consistency windows for objects, then you want to > make sure you isolate the chances of failure as much as possible. By > setting up your zones to be as isolated as possible (separate power, > network, physical locality, etc.) you minimize the chance that there > will be a consistency window. > > For containers, the biggest factor that determines the consistency > window, is disk IO for the sqlite databases. In recent testing, basic > SATA hardware can handle somewhere in the range of 100 PUTs per second > (for smaller containers) to around 10 PUTs per second for very large > containers (millions of objects) before aync pendings start stacking > up and you begin to see consistency issues. With better hardware (for > example RAID 10 of SSD drives), it is easy to get 400-500 PUTs per > second with containers that have a billion objects in it. It is also > a good idea to run your container/account servers on separate hardware > than the object servers. After that, the same things for object > servers also apply to the container servers. > > All that said, please don't just take my word for it, and test it for > yourself :) > > -- > Chuck > -Nikolaus -- »Time flies like an arrow, fruit flies like a Banana.« PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C _______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp