Makes sense. I can try playing around with these settings .... when you're saying client, would this be libcephfs.so ?
On Tue, Jul 9, 2013 at 5:35 PM, Noah Watkins <noah.watk...@inktank.com>wrote: > Greg pointed out the read-ahead client options. I would suggest > fiddling with these settings. If things improve, we can put automatic > configuration of these settings into the Hadoop client itself. At the > very least, we should be able to see if it is the read-ahead that is > causing performance problems. > > OPTION(client_readahead_min, OPT_LONGLONG, 128*1024) // readahead at > _least_ this much. > OPTION(client_readahead_max_bytes, OPT_LONGLONG, 0) //8 * 1024*1024 > OPTION(client_readahead_max_periods, OPT_LONGLONG, 4) // as multiple > of file layout period (object size * num stripes) > > -Noah > > > On Tue, Jul 9, 2013 at 3:27 PM, Noah Watkins <noah.watk...@inktank.com> > wrote: > >> Is the JNI interface still an issue or have we moved past that ? > > > > We haven't done much performance tuning with Hadoop, but I suspect > > that the JNI interface is not a bottleneck. > > > > My very first thought about what might be causing slow read > > performance is the read-ahead settings we use vs Hadoop. Hadoop should > > be performing big, efficient, block-size reads and caching these in > > each map task. However, I think we are probably doing lots of small > > reads on demand. That would certainly hurt performance. > > > > In fact, in CephInputStream.java I see we are doing buffer-sized > > reads. Which, at least in my tree, turn out to be 4096 bytes :) > > > > So, there are two issues now. First, the C-Java barrier is being cross > > a lot (16K times for a 64MB block). That's probably not a huge > > overhead, but it might be something. The second is read-ahead. I'm not > > sure how much read-ahead the libcephfs client is performing, but the > > more round trips its doing the more overhead we would incur. > > > > > >> > >> thanks ! > >> > >> > >> > >> > >> On Tue, Jul 9, 2013 at 3:01 PM, ker can <kerca...@gmail.com> wrote: > >>> > >>> For this particular test I turned off replication for both hdfs and > ceph. > >>> So there is just one copy of the data lying around. > >>> > >>> hadoop@vega7250:~$ ceph osd dump | grep rep > >>> pool 0 'data' rep size 1 min_size 1 crush_ruleset 0 object_hash > rjenkins > >>> pg_num 960 pgp_num 960 last_change 26 owner 0 crash_replay_interval 45 > >>> pool 1 'metadata' rep size 2 min_size 1 crush_ruleset 1 object_hash > >>> rjenkins pg_num 960 pgp_num 960 last_change 1 owner 0 > >>> pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 2 object_hash rjenkins > >>> pg_num 960 pgp_num 960 last_change 1 owner 0 > >>> > >>> From hdfs-site.xml: > >>> > >>> <property> > >>> <name>dfs.replication</name> > >>> <value>1</value> > >>> </property> > >>> > >>> > >>> > >>> > >>> > >>> On Tue, Jul 9, 2013 at 2:44 PM, Noah Watkins <noah.watk...@inktank.com > > > >>> wrote: > >>>> > >>>> On Tue, Jul 9, 2013 at 12:35 PM, ker can <kerca...@gmail.com> wrote: > >>>> > hi Noah, > >>>> > > >>>> > while we're still on the hadoop topic ... I was also trying out the > >>>> > TestDFSIO tests ceph v/s hadoop. The Read tests on ceph takes about > >>>> > 1.5x > >>>> > the hdfs time. The write tests are worse about ... 2.5x the time on > >>>> > hdfs, > >>>> > but I guess we have additional journaling overheads for the writes > on > >>>> > ceph. > >>>> > But there should be no such overheads for the read ? > >>>> > >>>> Out of the box Hadoop will keep 3 copies, and Ceph 2, so it could be > >>>> the case that reads are slower because there is less opportunity for > >>>> scheduling local reads. You can create a new pool with replication=3 > >>>> and test this out (documentation on how to do this is on > >>>> http://ceph.com/docs/wip-hadoop-doc/cephfs/hadoop/). > >>>> > >>>> As for writes, Hadoop will write 2 remote and 1 local blocks, however > >>>> Ceph will write all copies remotely, so there is some overhead for the > >>>> extra remote object write (compared to Hadoop), but i wouldn't have > >>>> expected 2.5x. It might be useful to run dd or something like that on > >>>> Ceph to see if the numbers make sense to rule out Hadoop as the > >>>> bottleneck. > >>>> > >>>> -Noah > >>> > >>> > >> >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com