For this particular test I turned off replication for both hdfs and ceph. So there is just one copy of the data lying around.
hadoop@vega7250:~$ ceph osd dump | grep rep pool 0 'data' rep size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 960 pgp_num 960 last_change 26 owner 0 crash_replay_interval 45 pool 1 'metadata' rep size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 960 pgp_num 960 last_change 1 owner 0 pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 960 pgp_num 960 last_change 1 owner 0 >From hdfs-site.xml: <property> <name>dfs.replication</name> <value>1</value> </property> On Tue, Jul 9, 2013 at 2:44 PM, Noah Watkins <noah.watk...@inktank.com>wrote: > On Tue, Jul 9, 2013 at 12:35 PM, ker can <kerca...@gmail.com> wrote: > > hi Noah, > > > > while we're still on the hadoop topic ... I was also trying out the > > TestDFSIO tests ceph v/s hadoop. The Read tests on ceph takes about 1.5x > > the hdfs time. The write tests are worse about ... 2.5x the time on > hdfs, > > but I guess we have additional journaling overheads for the writes on > ceph. > > But there should be no such overheads for the read ? > > Out of the box Hadoop will keep 3 copies, and Ceph 2, so it could be > the case that reads are slower because there is less opportunity for > scheduling local reads. You can create a new pool with replication=3 > and test this out (documentation on how to do this is on > http://ceph.com/docs/wip-hadoop-doc/cephfs/hadoop/). > > As for writes, Hadoop will write 2 remote and 1 local blocks, however > Ceph will write all copies remotely, so there is some overhead for the > extra remote object write (compared to Hadoop), but i wouldn't have > expected 2.5x. It might be useful to run dd or something like that on > Ceph to see if the numbers make sense to rule out Hadoop as the > bottleneck. > > -Noah >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com