Re: [ceph-users] Hadoop / Ceph and Data locality ?

ker can Mon, 08 Jul 2013 19:12:41 -0700

Yep, I'm running cuttlefish ... I'll try building out of that branch and
let you know how that goes.


-KC


On Mon, Jul 8, 2013 at 9:06 PM, Noah Watkins <noah.watk...@inktank.com>wrote:

> FYI, here is the patch as it currently stands:
>
>
> https://github.com/ceph/hadoop-common/compare/cephfs;branch-1.0...cephfs;branch-1.0-topo
>
> I have not tested it recently, but it looks like it should be close to
> correct. Feel free to test it out--I won't be able to get to until
> tomorrow or Wednesday.
>
> Are you running Cuttlefish? I believe it has all the dependencies.
>
> On Mon, Jul 8, 2013 at 7:00 PM, Noah Watkins <noah.watk...@inktank.com>
> wrote:
> > KC,
> >
> > Thanks a lot for checking that out. I just went to investigate, and
> > the work we have done on the locality/topology-aware features are
> > sitting in a branch, and have not been merged into the tree that is
> > used to produce the JAR file you are using. I will get that cleaned up
> > and merged soon, and I think that should solve your problems :)
> >
> > -Noah
> >
> > On Mon, Jul 8, 2013 at 6:22 PM, ker can <kerca...@gmail.com> wrote:
> >> hi Noah, okay I think the current version may have a problem haven't
> figured
> >> out where yet. Looking at the log messages and how the data blocks are
> >> distributed among the OSDs.
> >>
> >> So, the job tracker log had for example this output for the map task
> for the
> >> first split/block 0 – which it’s executing on host vega7250.
> >>
> >> ....
> >>
> >> ....
> >>
> >> 2013-07-08 11:19:54,836 INFO org.apache.hadoop.mapred.JobTracker: Adding
> >> task (MAP) 'attempt_201307081115_0001_m_000000_0' to tip
> >> task_201307081115_0001_m_000000, for tracker
> >> 'tracker_vega7250:localhost/127.0.0.1:35422'
> >>
> >> ...
> >>
> >> ...
> >>
> >>
> >> If I look at how the blocks are divided up among the OSDs, block 0 for
> >> example is managed by OSD#2 – which is running on host vega7249.
> However our
> >> map task for block 0 is running on another host.  Definitely not
> co-located.
> >>
> >>
> >>
> >>    FILE OFFSET                    OBJECT                    OFFSET
> LENGTH
> >> OSD
> >>
> >>                         0      10000000dbe.00000000             0
> >> 67108864         2
> >>
> >>        67108864      10000000dbe.00000001              0      67108864
> >> 13
> >>
> >>       134217728     10000000dbe.00000002             0      67108864
> >> 5
> >>
> >>       201326592     10000000dbe.00000003             0      67108864
> >> 4
> >>
> >> ….
> >>
> >> ….
> >>
> >>
> >>
> >> Ceph osd tree:
> >>
> >>  # id    weight  type name       up/down reweight
> >>
> >> -1      14      root default
> >>
> >> -3      14              rack unknownrack
> >>
> >> -2      7                       host vega7249
> >>
> >> 0       1                               osd.0   up      1
> >>
> >> 1       1                               osd.1   up      1
> >>
> >> 2       1                               osd.2   up      1
> >>
> >> 3       1                               osd.3   up      1
> >>
> >> 4       1                               osd.4   up      1
> >>
> >> 5       1                               osd.5   up      1
> >>
> >> 6       1                               osd.6   up      1
> >>
> >> -4      7                       host vega7250
> >>
> >> 10      1                               osd.10  up      1
> >>
> >> 11      1                               osd.11  up      1
> >>
> >> 12      1                               osd.12  up      1
> >>
> >> 13      1                               osd.13  up      1
> >>
> >> 7       1                               osd.7   up      1
> >>
> >> 8       1                               osd.8   up      1
> >>
> >> 9       1                               osd.9   up      1
> >>
> >>
> >> Thanks
> >> KC
> >>
> >>
> >> On Mon, Jul 8, 2013 at 3:36 PM, Noah Watkins <noah.watk...@inktank.com>
> >> wrote:
> >>>
> >>> Yes, all of the code needed to get the locality information should be
> >>> present the version of the jar file you referenced. We have tested a
> >>> to make sure the right data is available, but have not extensively
> >>> tested that it is being used correctly by core Hadoop (e.g. that is
> >>> being correctly propagated out of CephFileSystem). IIRC fixing this
> >>> /should/ be pretty easy; fiddling with getFileBlockLocation.
> >>>
> >>> On Mon, Jul 8, 2013 at 1:25 PM, ker can <kerca...@gmail.com> wrote:
> >>> > Hi Noah,
> >>> >
> >>> > I'm using the CephFS jar from ...
> >>> > http://ceph.com/download/hadoop-cephfs.jar
> >>> > I beleive this is built from hadoop-common/cephfs/branch-1.0 ?
> >>> >
> >>> > If thats the case, I should already be using an implementation thats
> got
> >>> > getFileBlockLocations() ... which is here
> >>> >
> >>> >
> https://github.com/ceph/hadoop-common/blob/cephfs/branch-1.0/src/core/org/apache/hadoop/fs/ceph/CephFileSystem.java
> >>> >
> >>> > Is there a command line tool that I can use to verify the results
> from
> >>> > getFileBlockLocations() ?
> >>> >
> >>> > thanks
> >>> > KC
> >>> >
> >>> >
> >>> >
> >>> > On Mon, Jul 8, 2013 at 3:09 PM, Noah Watkins <
> noah.watk...@inktank.com>
> >>> > wrote:
> >>> >>
> >>> >> Hi KC,
> >>> >>
> >>> >> The locality information is now collected and available to Hadoop
> >>> >> through the CephFS API, so fixing this is certainly possible.
> However,
> >>> >> there has not been extensive testing. I think the tasks that need to
> >>> >> be completed are (1) make sure that `CephFileSystem` is encoding the
> >>> >> correct block location in `getFileBlockLocations` (which I think it
> is
> >>> >> currently completed, but does need to be verified), and (2) make
> sure
> >>> >> rack information is available in the jobtracker, or optionally use a
> >>> >> flat hierarchy (i.e. default-rack).
> >>> >>
> >>> >> On Mon, Jul 8, 2013 at 12:47 PM, ker can <kerca...@gmail.com>
> wrote:
> >>> >> > Hi There,
> >>> >> >
> >>> >> > I'm test driving Hadoop with CephFS as the storage layer. I was
> >>> >> > running
> >>> >> > the
> >>> >> > Terasort benchmark and  I noticed a lot of network IO activity
> when
> >>> >> > compared
> >>> >> > to a HDFS storage layer setup. (Its a half-a-terabyte sort
> workload
> >>> >> > over
> >>> >> > two
> >>> >> > data nodes.)
> >>> >> >
> >>> >> > Digging into the job tracker logs a little, I noticed that all the
> >>> >> > map
> >>> >> > tasks
> >>> >> > were being assigned to process a split (block)  on non-local nodes
> >>> >> > (which
> >>> >> > explains all the network activity during the map phase)
> >>> >> >
> >>> >> > With Ceph:
> >>> >> >
> >>> >> >
> >>> >> > 2013-07-08 11:19:53,535 INFO
> org.apache.hadoop.mapred.JobInProgress:
> >>> >> > Input
> >>> >> > size for job job_201307081115_0001 = 500000000000. Number of
> splits =
> >>> >> > 7452
> >>> >> > 2013-07-08 11:19:53,538 INFO
> org.apache.hadoop.mapred.JobInProgress:
> >>> >> > Job
> >>> >> > job_201307081115_0001 initialized successfully with 7452 map tasks
> >>> >> > and
> >>> >> > 32
> >>> >> > reduce tasks.
> >>> >> >
> >>> >> > 2013-07-08 11:19:54,836 INFO
> org.apache.hadoop.mapred.JobInProgress:
> >>> >> > Choosing a non-local task task_201307081115_0001_m_000000
> >>> >> > 2013-07-08 11:19:54,836 INFO org.apache.hadoop.mapred.JobTracker:
> >>> >> > Adding
> >>> >> > task (MAP) 'attempt_201307081115_0001_m_000000_0' to tip
> >>> >> > task_201307081115_0001_m_000000, for tracker
> >>> >> > 'tracker_vega7250:localhost/127.0.0.1:35422'
> >>> >> >
> >>> >> > 2013-07-08 11:19:54,990 INFO
> org.apache.hadoop.mapred.JobInProgress:
> >>> >> > Choosing a non-local task task_201307081115_0001_m_000001
> >>> >> > 2013-07-08 11:19:54,990 INFO org.apache.hadoop.mapred.JobTracker:
> >>> >> > Adding
> >>> >> > task (MAP) 'attempt_201307081115_0001_m_000001_0' to tip
> >>> >> > task_201307081115_0001_m_000001, for tracker
> >>> >> > 'tracker_vega7249:localhost/127.0.0.1:36725'
> >>> >> >
> >>> >> > ... and so on.
> >>> >> >
> >>> >> > In comparison with HDFS, the job tracker logs looked something
> like
> >>> >> > this.
> >>> >> > The maps tasks were being assigned to process data blocks on the
> >>> >> > local
> >>> >> > nodes.
> >>> >> >
> >>> >> > 2013-07-08 03:55:32,656 INFO
> org.apache.hadoop.mapred.JobInProgress:
> >>> >> > Input
> >>> >> > size for job job_201307080351_0001 = 500000000000. Number of
> splits =
> >>> >> > 7452
> >>> >> > 2013-07-08 03:55:32,657 INFO
> org.apache.hadoop.mapred.JobInProgress:
> >>> >> > tip:task_201307080351_0001_m_000000 has split on
> >>> >> > node:/default-rack/vega7247
> >>> >> > 2013-07-08 03:55:32,657 INFO
> org.apache.hadoop.mapred.JobInProgress:
> >>> >> > tip:task_201307080351_0001_m_000001 has split on
> >>> >> > node:/default-rack/vega7247
> >>> >> > 2013-07-08 03:55:34,474 INFO org.apache.hadoop.mapred.JobTracker:
> >>> >> > Adding
> >>> >> > task (MAP) 'attempt_201307080351_0001_m_000000_0' to tip
> >>> >> > task_201307080351_0001_m_000000, for tracker
> >>> >> > 'tracker_vega7247:localhost/127.0.0.1:43320'
> >>> >> > 2013-07-08 03:55:34,475 INFO
> org.apache.hadoop.mapred.JobInProgress:
> >>> >> > Choosing data-local task task_201307080351_0001_m_000000
> >>> >> > 2013-07-08 03:55:34,475 INFO org.apache.hadoop.mapred.JobTracker:
> >>> >> > Adding
> >>> >> > task (MAP) 'attempt_201307080351_0001_m_000001_0' to tip
> >>> >> > task_201307080351_0001_m_000001, for tracker
> >>> >> > 'tracker_vega7247:localhost/127.0.0.1:43320'
> >>> >> > 2013-07-08 03:55:34,475 INFO
> org.apache.hadoop.mapred.JobInProgress:
> >>> >> > Choosing data-local task task_201307080351_0001_m_000001
> >>> >> >
> >>> >> > Version Info:
> >>> >> > ceph version 0.61.4
> >>> >> > hadoop 1.1.2
> >>> >> >
> >>> >> > Has anyone else run into this ?
> >>> >> >
> >>> >> > Thanks
> >>> >> > KC
> >>> >> >
> >>> >> > _______________________________________________
> >>> >> > ceph-users mailing list
> >>> >> > ceph-users@lists.ceph.com
> >>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>> >> >
> >>> >
> >>> >
> >>
> >>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Hadoop / Ceph and Data locality ?

Reply via email to