On 01.05.15 20:33, Samuel Merritt wrote: > On 5/1/15 7:55 AM, Uwe Sauter wrote: >> >> >> Am 01.05.2015 um 02:21 schrieb Samuel Merritt: >>> >>> It seems like 1430268763.41931.data would be in the same allocation >>> group as >>> objects/757/a94/bd77129a1cae9e32381776e322efca94, and >>> bd77129a1cae9e32381776e322efca94 would be in the same allocation >>> group as objects/757/a94, and so on. Thus, everything would be in the >>> same allocation group as the root directory. >>> >>> This can't be the case, or else there'd be no point to allocation >>> groups. What am I missing here? >> >> >> Hi, >> >> I think what you're missing is, that inodes stay in the allocation >> group where they first were created. So moving a file >> around in the filesystem changes the path but not the allocation >> group. So first creating a temporary file and then >> moving it into the hash folder leaves the file associated with the >> temp folder's allocation group, thus the allocation >> group grows bigger and bigger and searching the allocation group takes >> more and more time. > > That doesn't really answer the question, though. We have this message > <http://www.spinics.net/lists/xfs/msg32868.html> which says that "...the > locality of a new inode is determined by the > parent inode, and so if all new inodes are created in the same > directory, then they are all created in the same AG." > > Let's say we start out with a freshly-formatted disk, so there's only > one inode, and it's for the root directory. > > Then, Swift goes and starts making its directory structure on disk, and > calls mkdir('objects'). Since a new inode is created in the same AG as > its parent, the inode for '/objects' is in the same AG as the inode for > '/'. > > Swift makes another dir: mkdir('objects/757') > > The inode for '/objects/757' is in the same AG as its parent '/objects', > which is the same as the AG for '/'. > > Keep going a while, and you get > > / > /objects > /objects/757 > /objects/757/a94 > /objects/757/a94/bd77129a1cae9e32381776e322efca94 > /objects/757/a94/bd77129a1cae9e32381776e322efca94/1430268763.41931.data > /tmp > > and they're all in the same AG. > > Now, the XFS developers are not stupid, so what I typed up there can't > possibly be true, or else every inode on a filesystem would be in the > same AG. > > So, my question is this: what, of the things I typed above, is false? > Equivalently, how is an inode created in a *different* AG than its parent?
Hmm, reading the docs at http://xfs.org/docs/xfsdocs-xml-dev/XFS_Filesystem_Structure/tmp/en-US/html/AG_Free_Space_Management.html I would assume a different AG is selected if there is one AG with more free space. But, and that might be one of the problems here: it seems there is a default of only 4 allocation groups, at least that's what I see on various disks executing a xfs_info. In fact after looking into the sources of mkfs.xfs I found this default for disks with sizes up to 4TB: http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/cmds/xfsprogs.git;a=blob;f=mkfs/xfs_mkfs.c;h=5084d755;hb=HEAD#l688 Might be a good idea to do some benchmarking with different AG numbers? -- Christian _______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack