Yup, this is a totally possible failure scenario, and Swift will merge the data (using last-write-wins for overwrites) automatically when the partition is restored. But you'll still have full durability on writes, even with a partitioned global cluster.
--John On Aug 27, 2014, at 10:49 AM, Marcus White <roastedseawee...@gmail.com> wrote: > Yup, thanks for the great explanation:) > > Another question, though related: If there are three regions, and two > get "split", there is now a partition. Both the split ones can talk to > the third, but not to each other. > > A PUT comes into one region, and it gets written to the local site. > Container information presumably gets updated here, including byte > count. > > Same thing happens on another site, where a PUT comes in and the > container information is updated with the byte count. > When the sites get back together, do the container servers make sure > its all correct in the end? > > If this is not a possible scenario, is there any case where the > container metadata can be different between two zones or regions > because of a partition, and independent PUTs can happen, and the data > has to be merged? Is that all done by the respective servers(container > or account?) > > I will be looking at the sources soon. > > Thanks again > MW. > > > > On Wed, Aug 27, 2014 at 8:13 PM, Luse, Paul E <paul.e.l...@intel.com> wrote: >> Marcus- >> >> Not sure how much nitty gritty detail you care to know as some of these >> answers will get into code specifics which you're better off exploring on >> your own so my explanation isn't potentially dated. At a high level though, >> the proxy looks up the nodes that are responsible for the storing of an >> object and its container via the rings. It passes that info to the storage >> nodes when it does the PUT request so when the storage node goes to update >> the container it's been told "and here are the nodes to send the container >> update to". It will send the updates to all of them. Similarly, once the >> container server has updated its database it goes and updates the >> appropriate account databases. >> >> Make sense? >> >> Thx >> Paul >> >> -----Original Message----- >> From: Marcus White [mailto:roastedseawee...@gmail.com] >> Sent: Wednesday, August 27, 2014 7:04 AM >> To: Luse, Paul E >> Cc: openstack >> Subject: Re: [Openstack] Swift questions >> >> Thanks Paul:) >> >> For the container part, you mentioned that node(meaning object >> server?) contacts the container server. Since you can have multiple >> container servers, how does the object server know which container server to >> contact? How and where the container gets updated is a bit confusing. With >> container rings and account rings being separate and in the proxy part, I am >> not sure I understand how that path works. >> >> MW >> >> On Wed, Aug 27, 2014 at 6:15 PM, Luse, Paul E <paul.e.l...@intel.com> wrote: >>> Hi Marcus, >>> >>> See answers below. Feel free to ask follow-ups, others may have more to >>> add as well. >>> >>> Thx >>> Paul >>> >>> -----Original Message----- >>> From: Marcus White [mailto:roastedseawee...@gmail.com] >>> Sent: Wednesday, August 27, 2014 5:04 AM >>> To: openstack >>> Subject: [Openstack] Swift questions >>> >>> Hello, >>> Some questions on new and old features of Swift. Any help would be >>> great:) Some are very basic, sorry! >>> >>> 1. Does Swift write two copies and then return back to the client in the 3 >>> replica case, with third in the background? >>> >>> PL> Depends on the number of replicas, the formula for what we call a >>> quorum is n/2 + 1 which is the number of success responses we get from the >>> back end storage nodes before telling the client that all is good. So, >>> yes, with 3 replicas you need 2 good responses before returning OK. >>> >>> 2. This again is a stupid question, but eventually consistent for an object >>> is a bit confusing, unless it is updated. If it is created, it is either >>> there or not and you cannot update the data within the object. Maybe a POST >>> can change the metadata? Or the container listing shows its there but the >>> actual object never got there? Those are the only cases I can think of. >>> >>> PL> No, it's a good question because its asked a lot. The most common >>> scenario that we talk about for eventually consistent is the consistency >>> between the existence of an object and its presence in the container >>> listing so your thinking is pretty close. When an object PUT is complete >>> on a storage node (fully committed to disk), that node will then send a >>> message to the appropriate container server to update the listing. It will >>> attempt to do this synchronously but if it can't, the update may be delayed >>> w/o any indication to the client. This is by design and means that it's >>> possible to get a successful PUT, be able to GET the object w/o any problem >>> however it may not yet show up in the container listing. There are other >>> scenarios that demonstrate the eventually consistent nature of Swift, this >>> is just a common and easy to explain one. >>> >>> 3. Once an object has been written, when and how is the container >>> listing, number of bytes, account listing (if new container created) >>> etc updated? Is there something done in the path of the PUT to >>> indicate this object belongs to a particular container and the number >>> of bytes etc is done in the background? A little clarification would >>> help:) >>> >>> PL> Covered as part of last question. >>> >>> 4. For the global clusters, is the object ring across regions or is it the >>> same with containers and accounts also? >>> >>> PL> Check out the SwiftStack blog if you haven't already at >>> https://swiftstack.com/blog/2013/07/02/swift-1-9-0-release/ and there's >>> also some other stuff (including a demo from the last summit) that you can >>> find googling around a bit too. The 'Region Tier' element described in the >>> blog addresses the makeup of a ring so can be applied to both container and >>> account rings also - I personally didn't work on this feature so will leave >>> it to one of the other guys to comment more in this area. >>> >>> 5. For containers in global clusters, if a client queries the >>> container metadata from another site, is there a chance of it getting >>> the old metadata? With respect to the object itself, the eventually >>> consistent part is a bit confusing for me:) >>> >>> PL> There's always a chance of getting old "something" whether its metadata >>> or data, that's part of eventually consistent. In the face of an outage >>> (the P in the CAP theorem) Swift will always favor availability which may >>> mean older data or older metadata (object or container listing) depending >>> on the specific scenario. If deployed correctly I don't believe use of >>> global clusters increases the odds of this happening though (again will >>> count on someone else to say more) and its worth emphasizing the getting >>> "old stuff" is in the face of some sort of failure (or big network >>> congestion) so you shouldn't think of eventually consistent as being a >>> system where you "get whatever you get". You'll get the latest greatest >>> available information. >>> >>> MW >>> >>> _______________________________________________ >>> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >>> Post to : openstack@lists.openstack.org >>> Unsubscribe : >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > > _______________________________________________ > Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > Post to : openstack@lists.openstack.org > Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack