Not quite. Let's walk through an example:

I have a small ring:

$ swift-ring-builder ./object.builder
./object.builder, build version 4
64 partitions, 3.000000 replicas, 1 regions, 2 zones, 4 devices, 0.00 balance
The minimum number of hours before a partition can be reassigned is 0
Devices:    id  region  zone      ip address  port  replication ip  replication 
port      name weight partitions balance meta
             0       1     1       127.0.0.1  6010       127.0.0.1              
6010        d1   1.00         48    0.00
             1       1     1       127.0.0.1  6020       127.0.0.1              
6020        d2   1.00         48    0.00
             2       1     2       127.0.0.1  6030       127.0.0.1              
6030        d3   1.00         48    0.00
             3       1     2       127.0.0.1  6040       127.0.0.1              
6040        d4   1.00         48    0.00



4 devices, 3 replicas, part power of 6. The part power of 6 means that I have 
64 possible partitions. The part power is simply the number of prefix bits of 
the result of a call to md5(). Hash something, and take the first 6 bits and 
that's the partition it's in. Because of the way md5 works, you get a nice 
splaying across the 64 partitions.

Now, with 64 partitions and 3 replicas, I have 192 total partition replicas to 
place on the four devices. Since all devices are weighted evenly ("1.00" in the 
example), I end up with an even placement and 48 partitions assigned to each 
drive (2**6*3/4=48). Now you've got a balanced ring and each partition (ie of 
the 64 partitions) is placed on 3 drives. For more details on that, see the 
earlier referenced video.

Suppose I have a Swift account called APP_awesome. (Remember that Swift's 
accounts are storage areas, not necessarily 1:1 with user identities.) In that 
account, I was to put things, so I create a container called "things". Now I 
have a place to put all of my awesome things. The first awesome thing I want to 
put is backup.tgz. Where will it go in the cluster?

$ swift-get-nodes object.ring.gz APP_awesome/things/backup.tgz

Account         APP_awesome
Container       things
Object          backup.tgz


Partition       51
Hash            cc4e888bfad168f782897e32a892c4ef

Server:Port Device      127.0.0.1:6010 d1
Server:Port Device      127.0.0.1:6040 d4
Server:Port Device      127.0.0.1:6020 d2
Server:Port Device      127.0.0.1:6030 d3        [Handoff]



How did `swift-get-nodes` find the partition? First, it took the entire object 
name ("APP_awesome/things/backup.tgz"), then added in the secret prefix and 
suffix from swift.conf (basically just salts to prevent attackers from filling 
up one partition), and then hashed that with md5. The resulting hash, in hex, 
is "cc4e888bfad168f782897e32a892c4ef". The raw digest of this hash value (a 16 
byte string) is unpacked (as big-endian unsigned ints) and then right-shifted 
by 26 (ie 32-6) so we get the first 6 bits. The resulting number is the 
partition. In this case, "51".

>>> key = md5(prefix + '/APP_awesome/things/backup.tgz' + suffix).digest()
>>> struct.unpack_from('>I', key)[0] >> 26
51

The resulting partition is the index in an array (serialized in 
object.ring.gz). The value at that index is the 3 nodes (ie drives) that are 
responsible for storing data at that partition. Find the IP, port, and mount 
point (name) for those drives, and you're ready to read or write the data.

As a clarification point, the "Handoff" node listed above is where the data 
will go if one of the primary drives fails. There is only one handoff because 
there are only 4 total drives in this ring.

*whew*

A few things to note here. First, the size of the object has nothing to do with 
the resulting location. The more objects you store, the more evenly your drives 
will fill up (because md5 has good, even splaying). Second, the cost of doing 
all this computation is basically the cost of hashing the object name, and we 
know that (1) the name is bounded in length and (2) md5 is fast (enough). 
Therefore, ring lookups are cheap, and we don't have to read all the object 
data into memory before finding where it lives in the cluster.


Now, let's move back to your original numbers. You have 50 drives. The weight 
doesn't matter at this point, but the best-practice guideline is to set the 
weight to the number of GB on the drive (eg 3TB == 3000).

You want about 100 partitions per drive, so we need to find a part power that 
gives us that.

Find the smallest x such that (2**x * 3) / 50 > 100.

2**x > 166
math.log(166, 2) = 7.375

Therefore use a part power of 8. And once you have the 50 devices added to the 
ring, Swift will go through the math above to find the proper placement of each 
object.

Let me know if you have further questions.

--John





On Aug 19, 2014, at 7:15 PM, Brent Troge <brenttroge2...@gmail.com> wrote:

> Yeah I have watched that multiple times over the weekend, and has helped very 
> much.
> 
> So with respect to my example numbers, I am guessing that each partition will 
> land on every '41538374868278621028243970633760768' of the md5 space.
> 
> 2^(128 - 13)
> 
> or 
> 
> 2^(128)/8192
> 
> Thanks! 
> 
> 
> 
> On Tue, Aug 19, 2014 at 8:00 PM, John Dickinson <m...@not.mn> wrote:
> https://swiftstack.com/blog/2012/11/21/how-the-ring-works-in-openstack-swift/ 
> is soemthing that should be able to give you a pretty complete overview of 
> how the ring works in Swift and how data placement works.
> 
> Let me know if you have more questions after you watch that video.
> 
> --John
> 
> 
> 
> 
> 
> 
> On Aug 19, 2014, at 5:34 PM, Brent Troge <brenttroge2...@gmail.com> wrote:
> 
> >
> > Excuse this question and for lack of basic understanding. I dropped from 
> > school at 8th grade, so everything is basically self taught. Here goes.
> >
> > I am trying to figure out where each offset/partition is placed on the ring.
> >
> >
> > So If I have 50 drives with a weight of 100 each I come up with the below 
> > part power
> >
> > part power = log2(50 * 100) = 13
> >
> > Using that I then come up with the amount of partitions.
> >
> > partitions = 2^13 =  8192
> >
> > Now here is where my ignorance comes into play. How do I use these 
> > datapoints to determine where each offset is on the ring?
> >
> > I then guess that for each offset they will have a fixed range of values 
> > that map to that partition.
> >
> > So for example, for offset 1, all object URL md5 hashes that have a decimal 
> > value of 0 through 100 will go here(i just made up the range 0 through 100, 
> > i have no idea what the range would be with respect to my given part-power, 
> > drive, etc).
> > _______________________________________________
> > Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> > Post to     : openstack@lists.openstack.org
> > Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> 
> 

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Reply via email to