For benchmarking CPU, I start a pseudo-distributed HDFS cluster, put a
smallish file on the local datanode (such that it fits in buffer cache), and
then use the following script with various parameters to look at CPU usage
to cat the file. for example:

$ REPS_PER_RUN=50 NUM_TRIALS=10 ./read-benchmark.sh
hdfs://localhost/128M-file /tmp/benchmark-results.txt

Script:

#!/bin/sh -x
set -e
BINDIR=$(dirname $0)

INPUT=$1
OUTPUT=$2
NUM_TRIALS=${NUM_TRIALS:-10}
HADOOP=${HADOOP:-./bin/hadoop}
HADOOP_FLAGS=${HADOOP_FLAGS:--Dio.file.buffer.size=$[64*1024]}
REPS_PER_RUN=${REPS_PER_RUN:-1}


HEADER="major\tminor\tfs_in\tfs_out\twall\tuser\tsys\tctx_invol\tctx_vol\n"
TIME_FORMAT="%F\t%R\t%I\t%O\t%e\t%U\t%S\t%c\t%w"

! test -f $OUTPUT && printf $HEADER > $OUTPUT
for x in `seq 1 $NUM_TRIALS` ; do
    /usr/bin/time --append -o $OUTPUT -f $TIME_FORMAT \
        $HADOOP fs $HADOOP_FLAGS -cat $(for rep in $(seq 1 $REPS_PER_RUN) ;
do echo $INPUT ; done) > /dev/null
done


On Wed, Jul 6, 2011 at 1:16 AM, Keren Ouaknine <ker...@gmail.com> wrote:

> Hello,
>
> I am working on the optimization of task scheduling for Hadoop and would
> like to benchmark with* Apache Hadoop's standards benchmarks*. So far, I
> used my own scripts to measure and monitor. Where can I find the
> benchmarking you are referring to please?
>
> Thanks,
> Keren
>
> On Wed, Jul 6, 2011 at 7:32 AM, Todd Lipcon (JIRA) <j...@apache.org>
> wrote:
>
> > Simplify BlockReader to not inherit from FSInputChecker
> > -------------------------------------------------------
> >
> >                 Key: HDFS-2129
> >                 URL: https://issues.apache.org/jira/browse/HDFS-2129
> >             Project: Hadoop HDFS
> >          Issue Type: Sub-task
> >          Components: hdfs client
> >            Reporter: Todd Lipcon
> >            Assignee: Todd Lipcon
> >
> >
> > BlockReader is currently quite complicated since it has to conform to the
> > FSInputChecker inheritance structure. It would be much simpler to
> implement
> > it standalone. Benchmarking indicates it's slightly faster, as well.
> >
> > --
> > This message is automatically generated by JIRA.
> > For more information on JIRA, see:
> http://www.atlassian.com/software/jira
> >
> >
> >
>
>
> --
> Keren Ouaknine
> Cell: +972 54 2565404
> Web: www.kereno.com
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to