On Thu, Jul 18, 2013 at 3:13 PM, ker can <kerca...@gmail.com> wrote: > > the hbase+hdfs throughput results were 38x better. > Any thoughts on what might be going on ? > > Looks like this might be a data locality issue. After loading the table, when I look at the data block map of a region's store files its spread out on disks across all nodes. For my test 'usertable' hbase table osd 0-6 is on one node, and 7-13 is on another node. This is the map of region "da3b3bf6c0c5a9b387d23944122f208b" store file "0c43d345e3ea42abb5ce5a98b162218a"
hadoop@dmse-141:/mnt/mycephfs/hbase/usertable/da3b3bf6c0c5a9b387d23944122f208b/family$ cephfs 0c43d345e3ea42abb5ce5a98b162218a map FILE OFFSET OBJECT OFFSET LENGTH OSD 0 10000001abd.00000000 0 67108864 2 67108864 10000001abd.00000001 0 67108864 4 134217728 10000001abd.00000002 0 67108864 8 201326592 10000001abd.00000003 0 67108864 6 268435456 10000001abd.00000004 0 67108864 3 335544320 10000001abd.00000005 0 67108864 6 402653184 10000001abd.00000006 0 67108864 9 469762048 10000001abd.00000007 0 67108864 9 536870912 10000001abd.00000008 0 67108864 0 603979776 10000001abd.00000009 0 67108864 2 671088640 10000001abd.0000000a 0 67108864 8 738197504 10000001abd.0000000b 0 67108864 13 805306368 10000001abd.0000000c 0 67108864 1 872415232 10000001abd.0000000d 0 67108864 1 939524096 10000001abd.0000000e 0 67108864 3 1006632960 10000001abd.0000000f 0 67108864 7 1073741824 10000001abd.00000010 0 67108864 3 1140850688 10000001abd.00000011 0 67108864 13 1207959552 10000001abd.00000012 0 67108864 13 For hbase+hdfs, all blocks within a single region were on the same region server/data node. So in the region server stats with hdfs you see a 100% data locality index and much better cache hit ratios. hbase + hdfs region server stats: blockCacheSizeMB=201.31, blockCacheFreeMB=45.57, blockCacheCount=3013, blockCacheHitCount=9464863, blockCacheMissCount=10633061, blockCacheEvictedCount=9305729, blockCacheHitRatio=47%, blockCacheHitCachingRatio=50%, hdfsBlocksLocalityIndex=100, hbase + ceph region server stats: blockCacheSizeMB=205.59, blockCacheFreeMB=41.29, blockCacheCount=2989, blockCacheHitCount=1038372, blockCacheMissCount=1042117, blockCacheEvictedCount=397801, blockCacheHitRatio=49%, blockCacheHitCachingRatio=72%, hdfsBlocksLocalityIndex=47 With ceph is there any way to influence the data block placement for a single file ?
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com