Yep, I'm running cuttlefish ... I'll try building out of that branch and let you know how that goes.
-KC On Mon, Jul 8, 2013 at 9:06 PM, Noah Watkins <noah.watk...@inktank.com>wrote: > FYI, here is the patch as it currently stands: > > > https://github.com/ceph/hadoop-common/compare/cephfs;branch-1.0...cephfs;branch-1.0-topo > > I have not tested it recently, but it looks like it should be close to > correct. Feel free to test it out--I won't be able to get to until > tomorrow or Wednesday. > > Are you running Cuttlefish? I believe it has all the dependencies. > > On Mon, Jul 8, 2013 at 7:00 PM, Noah Watkins <noah.watk...@inktank.com> > wrote: > > KC, > > > > Thanks a lot for checking that out. I just went to investigate, and > > the work we have done on the locality/topology-aware features are > > sitting in a branch, and have not been merged into the tree that is > > used to produce the JAR file you are using. I will get that cleaned up > > and merged soon, and I think that should solve your problems :) > > > > -Noah > > > > On Mon, Jul 8, 2013 at 6:22 PM, ker can <kerca...@gmail.com> wrote: > >> hi Noah, okay I think the current version may have a problem haven't > figured > >> out where yet. Looking at the log messages and how the data blocks are > >> distributed among the OSDs. > >> > >> So, the job tracker log had for example this output for the map task > for the > >> first split/block 0 – which it’s executing on host vega7250. > >> > >> .... > >> > >> .... > >> > >> 2013-07-08 11:19:54,836 INFO org.apache.hadoop.mapred.JobTracker: Adding > >> task (MAP) 'attempt_201307081115_0001_m_000000_0' to tip > >> task_201307081115_0001_m_000000, for tracker > >> 'tracker_vega7250:localhost/127.0.0.1:35422' > >> > >> ... > >> > >> ... > >> > >> > >> If I look at how the blocks are divided up among the OSDs, block 0 for > >> example is managed by OSD#2 – which is running on host vega7249. > However our > >> map task for block 0 is running on another host. Definitely not > co-located. > >> > >> > >> > >> FILE OFFSET OBJECT OFFSET > LENGTH > >> OSD > >> > >> 0 10000000dbe.00000000 0 > >> 67108864 2 > >> > >> 67108864 10000000dbe.00000001 0 67108864 > >> 13 > >> > >> 134217728 10000000dbe.00000002 0 67108864 > >> 5 > >> > >> 201326592 10000000dbe.00000003 0 67108864 > >> 4 > >> > >> …. > >> > >> …. > >> > >> > >> > >> Ceph osd tree: > >> > >> # id weight type name up/down reweight > >> > >> -1 14 root default > >> > >> -3 14 rack unknownrack > >> > >> -2 7 host vega7249 > >> > >> 0 1 osd.0 up 1 > >> > >> 1 1 osd.1 up 1 > >> > >> 2 1 osd.2 up 1 > >> > >> 3 1 osd.3 up 1 > >> > >> 4 1 osd.4 up 1 > >> > >> 5 1 osd.5 up 1 > >> > >> 6 1 osd.6 up 1 > >> > >> -4 7 host vega7250 > >> > >> 10 1 osd.10 up 1 > >> > >> 11 1 osd.11 up 1 > >> > >> 12 1 osd.12 up 1 > >> > >> 13 1 osd.13 up 1 > >> > >> 7 1 osd.7 up 1 > >> > >> 8 1 osd.8 up 1 > >> > >> 9 1 osd.9 up 1 > >> > >> > >> Thanks > >> KC > >> > >> > >> On Mon, Jul 8, 2013 at 3:36 PM, Noah Watkins <noah.watk...@inktank.com> > >> wrote: > >>> > >>> Yes, all of the code needed to get the locality information should be > >>> present the version of the jar file you referenced. We have tested a > >>> to make sure the right data is available, but have not extensively > >>> tested that it is being used correctly by core Hadoop (e.g. that is > >>> being correctly propagated out of CephFileSystem). IIRC fixing this > >>> /should/ be pretty easy; fiddling with getFileBlockLocation. > >>> > >>> On Mon, Jul 8, 2013 at 1:25 PM, ker can <kerca...@gmail.com> wrote: > >>> > Hi Noah, > >>> > > >>> > I'm using the CephFS jar from ... > >>> > http://ceph.com/download/hadoop-cephfs.jar > >>> > I beleive this is built from hadoop-common/cephfs/branch-1.0 ? > >>> > > >>> > If thats the case, I should already be using an implementation thats > got > >>> > getFileBlockLocations() ... which is here > >>> > > >>> > > https://github.com/ceph/hadoop-common/blob/cephfs/branch-1.0/src/core/org/apache/hadoop/fs/ceph/CephFileSystem.java > >>> > > >>> > Is there a command line tool that I can use to verify the results > from > >>> > getFileBlockLocations() ? > >>> > > >>> > thanks > >>> > KC > >>> > > >>> > > >>> > > >>> > On Mon, Jul 8, 2013 at 3:09 PM, Noah Watkins < > noah.watk...@inktank.com> > >>> > wrote: > >>> >> > >>> >> Hi KC, > >>> >> > >>> >> The locality information is now collected and available to Hadoop > >>> >> through the CephFS API, so fixing this is certainly possible. > However, > >>> >> there has not been extensive testing. I think the tasks that need to > >>> >> be completed are (1) make sure that `CephFileSystem` is encoding the > >>> >> correct block location in `getFileBlockLocations` (which I think it > is > >>> >> currently completed, but does need to be verified), and (2) make > sure > >>> >> rack information is available in the jobtracker, or optionally use a > >>> >> flat hierarchy (i.e. default-rack). > >>> >> > >>> >> On Mon, Jul 8, 2013 at 12:47 PM, ker can <kerca...@gmail.com> > wrote: > >>> >> > Hi There, > >>> >> > > >>> >> > I'm test driving Hadoop with CephFS as the storage layer. I was > >>> >> > running > >>> >> > the > >>> >> > Terasort benchmark and I noticed a lot of network IO activity > when > >>> >> > compared > >>> >> > to a HDFS storage layer setup. (Its a half-a-terabyte sort > workload > >>> >> > over > >>> >> > two > >>> >> > data nodes.) > >>> >> > > >>> >> > Digging into the job tracker logs a little, I noticed that all the > >>> >> > map > >>> >> > tasks > >>> >> > were being assigned to process a split (block) on non-local nodes > >>> >> > (which > >>> >> > explains all the network activity during the map phase) > >>> >> > > >>> >> > With Ceph: > >>> >> > > >>> >> > > >>> >> > 2013-07-08 11:19:53,535 INFO > org.apache.hadoop.mapred.JobInProgress: > >>> >> > Input > >>> >> > size for job job_201307081115_0001 = 500000000000. Number of > splits = > >>> >> > 7452 > >>> >> > 2013-07-08 11:19:53,538 INFO > org.apache.hadoop.mapred.JobInProgress: > >>> >> > Job > >>> >> > job_201307081115_0001 initialized successfully with 7452 map tasks > >>> >> > and > >>> >> > 32 > >>> >> > reduce tasks. > >>> >> > > >>> >> > 2013-07-08 11:19:54,836 INFO > org.apache.hadoop.mapred.JobInProgress: > >>> >> > Choosing a non-local task task_201307081115_0001_m_000000 > >>> >> > 2013-07-08 11:19:54,836 INFO org.apache.hadoop.mapred.JobTracker: > >>> >> > Adding > >>> >> > task (MAP) 'attempt_201307081115_0001_m_000000_0' to tip > >>> >> > task_201307081115_0001_m_000000, for tracker > >>> >> > 'tracker_vega7250:localhost/127.0.0.1:35422' > >>> >> > > >>> >> > 2013-07-08 11:19:54,990 INFO > org.apache.hadoop.mapred.JobInProgress: > >>> >> > Choosing a non-local task task_201307081115_0001_m_000001 > >>> >> > 2013-07-08 11:19:54,990 INFO org.apache.hadoop.mapred.JobTracker: > >>> >> > Adding > >>> >> > task (MAP) 'attempt_201307081115_0001_m_000001_0' to tip > >>> >> > task_201307081115_0001_m_000001, for tracker > >>> >> > 'tracker_vega7249:localhost/127.0.0.1:36725' > >>> >> > > >>> >> > ... and so on. > >>> >> > > >>> >> > In comparison with HDFS, the job tracker logs looked something > like > >>> >> > this. > >>> >> > The maps tasks were being assigned to process data blocks on the > >>> >> > local > >>> >> > nodes. > >>> >> > > >>> >> > 2013-07-08 03:55:32,656 INFO > org.apache.hadoop.mapred.JobInProgress: > >>> >> > Input > >>> >> > size for job job_201307080351_0001 = 500000000000. Number of > splits = > >>> >> > 7452 > >>> >> > 2013-07-08 03:55:32,657 INFO > org.apache.hadoop.mapred.JobInProgress: > >>> >> > tip:task_201307080351_0001_m_000000 has split on > >>> >> > node:/default-rack/vega7247 > >>> >> > 2013-07-08 03:55:32,657 INFO > org.apache.hadoop.mapred.JobInProgress: > >>> >> > tip:task_201307080351_0001_m_000001 has split on > >>> >> > node:/default-rack/vega7247 > >>> >> > 2013-07-08 03:55:34,474 INFO org.apache.hadoop.mapred.JobTracker: > >>> >> > Adding > >>> >> > task (MAP) 'attempt_201307080351_0001_m_000000_0' to tip > >>> >> > task_201307080351_0001_m_000000, for tracker > >>> >> > 'tracker_vega7247:localhost/127.0.0.1:43320' > >>> >> > 2013-07-08 03:55:34,475 INFO > org.apache.hadoop.mapred.JobInProgress: > >>> >> > Choosing data-local task task_201307080351_0001_m_000000 > >>> >> > 2013-07-08 03:55:34,475 INFO org.apache.hadoop.mapred.JobTracker: > >>> >> > Adding > >>> >> > task (MAP) 'attempt_201307080351_0001_m_000001_0' to tip > >>> >> > task_201307080351_0001_m_000001, for tracker > >>> >> > 'tracker_vega7247:localhost/127.0.0.1:43320' > >>> >> > 2013-07-08 03:55:34,475 INFO > org.apache.hadoop.mapred.JobInProgress: > >>> >> > Choosing data-local task task_201307080351_0001_m_000001 > >>> >> > > >>> >> > Version Info: > >>> >> > ceph version 0.61.4 > >>> >> > hadoop 1.1.2 > >>> >> > > >>> >> > Has anyone else run into this ? > >>> >> > > >>> >> > Thanks > >>> >> > KC > >>> >> > > >>> >> > _______________________________________________ > >>> >> > ceph-users mailing list > >>> >> > ceph-users@lists.ceph.com > >>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>> >> > > >>> > > >>> > > >> > >> >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com