Yup, this is a totally possible failure scenario, and Swift will merge the data 
(using last-write-wins for overwrites) automatically when the partition is 
restored. But you'll still have full durability on writes, even with a 
partitioned global cluster.

--John



On Aug 27, 2014, at 10:49 AM, Marcus White <roastedseawee...@gmail.com> wrote:

> Yup, thanks for the great explanation:)
> 
> Another question, though related: If there are three regions, and two
> get "split", there is now a partition. Both the split ones can talk to
> the third, but not to each other.
> 
> A PUT comes into one region, and it gets written to the local site.
> Container information presumably gets updated here, including byte
> count.
> 
> Same thing happens on another site, where a PUT comes in and the
> container information is updated with the byte count.
> When the sites get back together, do the container servers make sure
> its all correct in the end?
> 
> If this is not a possible scenario, is there any case where the
> container metadata can be different between two zones or regions
> because of a partition, and independent PUTs can happen, and the data
> has to be merged? Is that all done by the respective servers(container
> or account?)
> 
> I will be looking at the sources soon.
> 
> Thanks again
> MW.
> 
> 
> 
> On Wed, Aug 27, 2014 at 8:13 PM, Luse, Paul E <paul.e.l...@intel.com> wrote:
>> Marcus-
>> 
>> Not sure how much nitty gritty detail you care to know as some of these 
>> answers will get into code specifics which you're better off exploring on 
>> your own so my explanation isn't potentially dated.  At a high level though, 
>> the proxy looks up the nodes that are responsible for the storing of an 
>> object and its container via the rings.  It passes that info to the storage 
>> nodes when it does the PUT request so when the storage node goes to update 
>> the container it's been told "and here are the nodes to send the container 
>> update to".  It will send the updates to all of them.  Similarly, once the 
>> container server has updated its database it goes and updates the 
>> appropriate account databases.
>> 
>> Make sense?
>> 
>> Thx
>> Paul
>> 
>> -----Original Message-----
>> From: Marcus White [mailto:roastedseawee...@gmail.com]
>> Sent: Wednesday, August 27, 2014 7:04 AM
>> To: Luse, Paul E
>> Cc: openstack
>> Subject: Re: [Openstack] Swift questions
>> 
>> Thanks Paul:)
>> 
>> For the container part, you mentioned that node(meaning object
>> server?) contacts the container server. Since you can have multiple 
>> container servers, how does the object server know which container server to 
>> contact? How and where the container gets updated is a bit confusing. With 
>> container rings and account rings being separate and in the proxy part, I am 
>> not sure I understand how that path works.
>> 
>> MW
>> 
>> On Wed, Aug 27, 2014 at 6:15 PM, Luse, Paul E <paul.e.l...@intel.com> wrote:
>>> Hi Marcus,
>>> 
>>> See answers below.  Feel free to ask follow-ups, others may have more to 
>>> add as well.
>>> 
>>> Thx
>>> Paul
>>> 
>>> -----Original Message-----
>>> From: Marcus White [mailto:roastedseawee...@gmail.com]
>>> Sent: Wednesday, August 27, 2014 5:04 AM
>>> To: openstack
>>> Subject: [Openstack] Swift questions
>>> 
>>> Hello,
>>> Some questions on new and old features of Swift. Any help would be
>>> great:) Some are very basic, sorry!
>>> 
>>> 1. Does Swift write two copies and then return back to the client in the 3 
>>> replica case, with third in the background?
>>> 
>>> PL>  Depends on the number of replicas, the formula for what we call a 
>>> quorum is n/2 + 1 which is the number of success responses we get from the 
>>> back end storage nodes before telling the client that all is good.  So, 
>>> yes, with 3 replicas you need 2 good responses before returning OK.
>>> 
>>> 2. This again is a stupid question, but eventually consistent for an object 
>>> is a bit confusing, unless it is updated. If it is created, it is either 
>>> there or not and you cannot update the data within the object. Maybe a POST 
>>> can change the metadata? Or the container listing shows its there but the 
>>> actual object never got there? Those are the only cases I can think of.
>>> 
>>> PL> No, it's a good question because its asked a lot.  The most common 
>>> scenario that we talk about for eventually consistent is the consistency 
>>> between the existence of an object and its presence in the container 
>>> listing so your thinking is pretty close.  When an object PUT is complete 
>>> on a storage node (fully committed to disk), that node will then send a 
>>> message to the appropriate container server to update the listing.  It will 
>>> attempt to do this synchronously but if it can't, the update may be delayed 
>>> w/o any indication to the client.  This is by design and means that it's 
>>> possible to get a successful PUT, be able to GET the object w/o any problem 
>>> however it may not yet show up in the container listing.  There are other 
>>> scenarios that demonstrate the eventually consistent nature of Swift, this 
>>> is just a common and easy to explain one.
>>> 
>>> 3. Once an object has been written, when and how is the container
>>> listing, number of bytes, account listing (if new container created)
>>> etc updated? Is there something done in the path of the PUT to
>>> indicate this object belongs to a particular container and the number
>>> of bytes etc is done in the background? A little clarification would
>>> help:)
>>> 
>>> PL>  Covered as part of last question.
>>> 
>>> 4. For the global clusters, is the object ring across regions or is it the 
>>> same with containers and accounts also?
>>> 
>>> PL>  Check out the SwiftStack blog if you haven't already at 
>>> https://swiftstack.com/blog/2013/07/02/swift-1-9-0-release/ and there's 
>>> also some other stuff (including a demo from the last summit) that you can 
>>> find googling around a bit too.  The 'Region Tier' element described in the 
>>> blog addresses the makeup of a ring so can be applied to both container and 
>>> account rings also - I personally didn't work on this feature so will leave 
>>> it to one of the other guys to comment more in this area.
>>> 
>>> 5. For containers in global clusters, if a client queries the
>>> container metadata from another site, is there a chance of it getting
>>> the old metadata? With respect to the object itself, the eventually
>>> consistent part is a bit confusing for me:)
>>> 
>>> PL> There's always a chance of getting old "something" whether its metadata 
>>> or data, that's part of eventually consistent.  In the face of an outage 
>>> (the P in the CAP theorem) Swift will always favor availability which may 
>>> mean older data or older metadata (object or container listing) depending 
>>> on the specific scenario.  If deployed correctly I don't believe use of 
>>> global clusters increases the odds of this happening though (again will 
>>> count on someone else to say more) and its worth emphasizing the getting 
>>> "old stuff" is in the face of some sort of failure (or big network 
>>> congestion) so you shouldn't think of eventually consistent as being a 
>>> system where you "get whatever you get".  You'll get the latest greatest 
>>> available information.
>>> 
>>> MW
>>> 
>>> _______________________________________________
>>> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>> Post to     : openstack@lists.openstack.org
>>> Unsubscribe :
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> 
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack@lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Reply via email to