Makes sense.  I can try playing around with these settings  .... when
you're saying client, would this be libcephfs.so ?





On Tue, Jul 9, 2013 at 5:35 PM, Noah Watkins <noah.watk...@inktank.com>wrote:

> Greg pointed out the read-ahead client options. I would suggest
> fiddling with these settings. If things improve, we can put automatic
> configuration of these settings into the Hadoop client itself. At the
> very least, we should be able to see if it is the read-ahead that is
> causing performance problems.
>
> OPTION(client_readahead_min, OPT_LONGLONG, 128*1024) // readahead at
> _least_ this much.
> OPTION(client_readahead_max_bytes, OPT_LONGLONG, 0) //8 * 1024*1024
> OPTION(client_readahead_max_periods, OPT_LONGLONG, 4) // as multiple
> of file layout period (object size * num stripes)
>
> -Noah
>
>
> On Tue, Jul 9, 2013 at 3:27 PM, Noah Watkins <noah.watk...@inktank.com>
> wrote:
> >> Is the JNI interface still an issue or have we moved past that ?
> >
> > We haven't done much performance tuning with Hadoop, but I suspect
> > that the JNI interface is not a bottleneck.
> >
> > My very first thought about what might be causing slow read
> > performance is the read-ahead settings we use vs Hadoop. Hadoop should
> > be performing big, efficient, block-size reads and caching these in
> > each map task. However, I think we are probably doing lots of small
> > reads on demand. That would certainly hurt performance.
> >
> > In fact, in CephInputStream.java I see we are doing buffer-sized
> > reads. Which, at least in my tree, turn out to be 4096 bytes :)
> >
> > So, there are two issues now. First, the C-Java barrier is being cross
> > a lot (16K times for a 64MB block). That's probably not a huge
> > overhead, but it might be something. The second is read-ahead. I'm not
> > sure how much read-ahead the libcephfs client is performing, but the
> > more round trips its doing the more overhead we would incur.
> >
> >
> >>
> >> thanks !
> >>
> >>
> >>
> >>
> >> On Tue, Jul 9, 2013 at 3:01 PM, ker can <kerca...@gmail.com> wrote:
> >>>
> >>> For this particular test I turned off replication for both hdfs and
> ceph.
> >>> So there is just one copy of the data lying around.
> >>>
> >>> hadoop@vega7250:~$ ceph osd dump | grep rep
> >>> pool 0 'data' rep size 1 min_size 1 crush_ruleset 0 object_hash
> rjenkins
> >>> pg_num 960 pgp_num 960 last_change 26 owner 0 crash_replay_interval 45
> >>> pool 1 'metadata' rep size 2 min_size 1 crush_ruleset 1 object_hash
> >>> rjenkins pg_num 960 pgp_num 960 last_change 1 owner 0
> >>> pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 2 object_hash rjenkins
> >>> pg_num 960 pgp_num 960 last_change 1 owner 0
> >>>
> >>> From hdfs-site.xml:
> >>>
> >>>   <property>
> >>>     <name>dfs.replication</name>
> >>>     <value>1</value>
> >>>   </property>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, Jul 9, 2013 at 2:44 PM, Noah Watkins <noah.watk...@inktank.com
> >
> >>> wrote:
> >>>>
> >>>> On Tue, Jul 9, 2013 at 12:35 PM, ker can <kerca...@gmail.com> wrote:
> >>>> > hi Noah,
> >>>> >
> >>>> > while we're still on the hadoop topic ... I was also trying out the
> >>>> > TestDFSIO tests ceph v/s hadoop.  The Read tests on ceph takes about
> >>>> > 1.5x
> >>>> > the hdfs time.  The write tests are worse about ... 2.5x the time on
> >>>> > hdfs,
> >>>> > but I guess we have additional journaling overheads for the writes
> on
> >>>> > ceph.
> >>>> > But there should be no such overheads for the read  ?
> >>>>
> >>>> Out of the box Hadoop will keep 3 copies, and Ceph 2, so it could be
> >>>> the case that reads are slower because there is less opportunity for
> >>>> scheduling local reads. You can create a new pool with replication=3
> >>>> and test this out (documentation on how to do this is on
> >>>> http://ceph.com/docs/wip-hadoop-doc/cephfs/hadoop/).
> >>>>
> >>>> As for writes, Hadoop will write 2 remote and 1 local blocks, however
> >>>> Ceph will write all copies remotely, so there is some overhead for the
> >>>> extra remote object write  (compared to Hadoop), but i wouldn't have
> >>>> expected 2.5x. It might be useful to run dd or something like that on
> >>>> Ceph to see if the numbers make sense to rule out Hadoop as the
> >>>> bottleneck.
> >>>>
> >>>> -Noah
> >>>
> >>>
> >>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to