On Thu, Jul 18, 2013 at 3:13 PM, ker can <kerca...@gmail.com> wrote:

>
> the hbase+hdfs throughput results were 38x better.
> Any thoughts on what might be going on ?
>
>
Looks like this might be a data locality issue. After loading the table,
when I look at the data block map of  a region's store files its spread out
on disks across all  nodes. For my test 'usertable' hbase table osd 0-6 is
on one node, and 7-13 is on another node.  This is the map of region
"da3b3bf6c0c5a9b387d23944122f208b" store file
"0c43d345e3ea42abb5ce5a98b162218a"

hadoop@dmse-141:/mnt/mycephfs/hbase/usertable/da3b3bf6c0c5a9b387d23944122f208b/family$
cephfs 0c43d345e3ea42abb5ce5a98b162218a map
    FILE OFFSET                    OBJECT        OFFSET        LENGTH  OSD
              0      10000001abd.00000000             0      67108864  2
       67108864      10000001abd.00000001             0      67108864  4
      134217728      10000001abd.00000002             0      67108864  8
      201326592      10000001abd.00000003             0      67108864  6
      268435456      10000001abd.00000004             0      67108864  3
      335544320      10000001abd.00000005             0      67108864  6
      402653184      10000001abd.00000006             0      67108864  9
      469762048      10000001abd.00000007             0      67108864  9
      536870912      10000001abd.00000008             0      67108864  0
      603979776      10000001abd.00000009             0      67108864  2
      671088640      10000001abd.0000000a             0      67108864  8
      738197504      10000001abd.0000000b             0      67108864  13
      805306368      10000001abd.0000000c             0      67108864  1
      872415232      10000001abd.0000000d             0      67108864  1
      939524096      10000001abd.0000000e             0      67108864  3
     1006632960      10000001abd.0000000f             0      67108864  7
     1073741824      10000001abd.00000010             0      67108864  3
     1140850688      10000001abd.00000011             0      67108864  13
     1207959552      10000001abd.00000012             0      67108864  13


For hbase+hdfs, all blocks within a single region were on the same region
server/data node. So in the region server stats with hdfs you see a 100%
data locality index and much better cache hit ratios.

hbase + hdfs region server stats:
blockCacheSizeMB=201.31, blockCacheFreeMB=45.57, blockCacheCount=3013,
blockCacheHitCount=9464863, blockCacheMissCount=10633061,
blockCacheEvictedCount=9305729, blockCacheHitRatio=47%,
blockCacheHitCachingRatio=50%,
hdfsBlocksLocalityIndex=100,

hbase + ceph region server stats:
blockCacheSizeMB=205.59, blockCacheFreeMB=41.29, blockCacheCount=2989,
blockCacheHitCount=1038372, blockCacheMissCount=1042117,
blockCacheEvictedCount=397801, blockCacheHitRatio=49%,
blockCacheHitCachingRatio=72%,
hdfsBlocksLocalityIndex=47


With ceph is there any way to influence the data block placement for a
single file ?
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to