> On Apr 24, 2015, at 12:42 AM, Shrinand Javadekar <shrin...@maginatics.com> 
> wrote:
> 
> Hi,
> 
> I observe that while placing data, the object server creates a
> directory structure:
> 
> /srv/node/r0/objects/<partition>/<3 byte hash suffix>/<hash>/<timestamp>.data.
> 
> Is there a reason for the <hash> directory to be created? Couldn't
> this just have been
> /srv/node/r0/objects/<partition>/<3 byte hash suffix>/<hash>.data?

Let's explore that idea. First, the general concept is sound. But let's explore 
the implications.

Suppose we did away with the hash dir and just had has.data. Then, each hash 
suffix directory will end up with an enormous amount of directories in it. This 
itself can cause issues in file systems. In fact, this is exactly why we have 
the hash suffix directory: to prevent the cardinality of the partition 
directory from becoming so large. So just doing away with the hash directory 
could cause some problems for the system as more and more objects get added 
(doing a listdir on a directory with a lot of files in it is _extremely_ slow).

But there's more than just splaying. .data files are not the only thing that 
can be stored about an object. There are also .meta files (which are used for 
"fast-post" but not enabled by default). These files store metadata, as you 
would assume. There are also .ts files (for "tombstone") that identify when an 
object has been deleted. And in the new erasure code storage policy, there is a 
new .durable file that marks when the system has an object that is durable in 
the cluster.

Also, each of these files are named according to a timestamp. So if we did away 
with the hash directory and instead put it all in the hash suffix, then we'd 
have to name files like <hash>.<timestamp>.data or something so that we can get 
ordering. Aside from the listdir issues mentioned above, that sort of filtering 
would also be expensive to sort and group all the files so that concurrent 
operations can be resolved. Therefore we've put all the things (ie files) 
associated with an object into its own directory.

Of course, the cost for the deeper directory structure is that there are more 
inodes and dentries in the filesystem.

All that being said, I think that the combined features of DiskFiles and 
Storage Policies should allow for some interesting experimentation of the 
on-disk layout. I'm sure there are optimizations, especially if you have any 
foreknowledge of the kind of data being stored. For example, small files could 
take advantage of some sort of Haystack-style slab storage. These ideas are 
simply that right now--ideas not implementations. But I'd love to see some R&D 
in these areas.


Hope this helps explain why the on-disk layout is the way it is.


--John





> 
> I am seeing a situation where after writing a few hundred Gigs worth
> of data, where each object is 256K, XFS metadata performance is
> deteriorating. Having less number of directories might help in that
> case.
> 
> -Shri
> 
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack@lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Reply via email to