Re: [ceph-users] Hadoop/Ceph and DFS IO tests

ker can Tue, 09 Jul 2013 13:31:54 -0700

by the way ... here's the log of the write.

13/07/09 05:52:56 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write (HDFS)
13/07/09 05:52:56 INFO fs.TestDFSIO:            Date & time: Tue Jul 09
05:52:56 PDT 2013
13/07/09 05:52:56 INFO fs.TestDFSIO:        Number of files: 300
13/07/09 05:52:56 INFO fs.TestDFSIO: Total MBytes processed: 460800
13/07/09 05:52:56 INFO fs.TestDFSIO:      Throughput mb/sec:
50.43823691216413
13/07/09 05:52:56 INFO fs.TestDFSIO: Average IO rate mb/sec:
52.558677673339844
13/07/09 05:52:56 INFO fs.TestDFSIO:  IO rate std deviation:
12.838500708755591
13/07/09 05:52:56 INFO fs.TestDFSIO:     Test exec time sec: 227.571
13/07/09 05:52:56 INFO fs.TestDFSIO:


13/07/09 13:22:09 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write (Ceph)
13/07/09 13:22:09 INFO fs.TestDFSIO:            Date & time: Tue Jul 09
13:22:09 PDT 2013
13/07/09 13:22:09 INFO fs.TestDFSIO:        Number of files: 300
13/07/09 13:22:09 INFO fs.TestDFSIO: Total MBytes processed: 460800
13/07/09 13:22:09 INFO fs.TestDFSIO:      Throughput mb/sec:
23.40132226611945
13/07/09 13:22:09 INFO fs.TestDFSIO: Average IO rate mb/sec:
24.76653480529785
13/07/09 13:22:09 INFO fs.TestDFSIO:  IO rate std deviation:
6.141010947451576
13/07/09 13:22:09 INFO fs.TestDFSIO:     Test exec time sec: 510.087
13/07/09 13:22:09 INFO fs.TestDFSIO:

In one of the older archive posts [
http://www.spinics.net/lists/ceph-devel/msg05387.html ] from last year I
saw a similar discussion of TestDFSIO performance ceph versus hdfs.
I saw mention of  "one reason you might be seeing throughput issues is with
the standard read/write interface that copies bytes across the JNI
interface. On the short list of stuff for the next Java wrapper set is to
use the ByteBuffer interface (NIO) to avoid this copying"

Is the JNI interface still an issue or have we moved past that ?

thanks !




On Tue, Jul 9, 2013 at 3:01 PM, ker can <kerca...@gmail.com> wrote:

> For this particular test I turned off replication for both hdfs and ceph.
> So there is just one copy of the data lying around.
>
> hadoop@vega7250:~$ ceph osd dump | grep rep
> pool 0 'data' rep size 1 min_size 1 crush_ruleset 0 object_hash rjenkins
> pg_num 960 pgp_num 960 last_change 26 owner 0 crash_replay_interval 45
> pool 1 'metadata' rep size 2 min_size 1 crush_ruleset 1 object_hash
> rjenkins pg_num 960 pgp_num 960 last_change 1 owner 0
> pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 2 object_hash rjenkins
> pg_num 960 pgp_num 960 last_change 1 owner 0
>
> From hdfs-site.xml:
>
>   <property>
>     <name>dfs.replication</name>
>     <value>1</value>
>   </property>
>
>
>
>
>
> On Tue, Jul 9, 2013 at 2:44 PM, Noah Watkins <noah.watk...@inktank.com>wrote:
>
>> On Tue, Jul 9, 2013 at 12:35 PM, ker can <kerca...@gmail.com> wrote:
>> > hi Noah,
>> >
>> > while we're still on the hadoop topic ... I was also trying out the
>> > TestDFSIO tests ceph v/s hadoop.  The Read tests on ceph takes about
>> 1.5x
>> > the hdfs time.  The write tests are worse about ... 2.5x the time on
>> hdfs,
>> > but I guess we have additional journaling overheads for the writes on
>> ceph.
>> > But there should be no such overheads for the read  ?
>>
>> Out of the box Hadoop will keep 3 copies, and Ceph 2, so it could be
>> the case that reads are slower because there is less opportunity for
>> scheduling local reads. You can create a new pool with replication=3
>> and test this out (documentation on how to do this is on
>> http://ceph.com/docs/wip-hadoop-doc/cephfs/hadoop/).
>>
>> As for writes, Hadoop will write 2 remote and 1 local blocks, however
>> Ceph will write all copies remotely, so there is some overhead for the
>> extra remote object write  (compared to Hadoop), but i wouldn't have
>> expected 2.5x. It might be useful to run dd or something like that on
>> Ceph to see if the numbers make sense to rule out Hadoop as the
>> bottleneck.
>>
>> -Noah
>>
>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Hadoop/Ceph and DFS IO tests

Reply via email to