Global scheduling in the Fair Scheduler

2009-12-22 Thread abhishek sharma
Hi all, I want to use the Hadoop Fair Scheduler with Global scheduling feature. As per the following link, http://issues.apache.org/jira/browse/MAPREDUCE-548, Matei Zaharia has committed a patch for this as part of version 0.21.0. I believe version 0.21.0 is not released yet. However, is it poss

Re: Global scheduling in the Fair Scheduler

2009-12-22 Thread abhishek sharma
7;s gotten easier. > > Note that you won't be able to use the 0.21 scheduler code in Hadoop 0.20 > without manually trying to port it though. > > Matei > > On Dec 22, 2009, at 7:23 PM, abhishek sharma wrote: > >> Hi all, >> >> I want to use the Hadoop

Re: Source Code

2010-01-13 Thread abhishek sharma
Do you want the latest source code or the source code for a particular release? In case, it is a particular release, the source code comes with the distribution. Abhishek On Wed, Jan 13, 2010 at 12:21 AM, wrote: > Hi all, >        I want to know how to get the source code for hadoop and from >

mapred JobClient: Error reading task output

2010-01-22 Thread abhishek sharma
Hi all, I run into task failures if I run several jobs on my 10 node cluster. I start seeing warnings of the following type before the job fails. WARN mapred.JobClient: Error reading task outputhttp://:50060/tasklog?plaintext=true&taskid=attempt_201001221644_0001_r_01_2&filter=stdout INFO m

resolution to Hadoop error: mapred.JobClient: Error reading task output

2010-01-23 Thread abhishek sharma
Hi all, I had sent a query yesterday asking about the following error WARN mapred.JobClient: Error reading task outputhttp://:50060/tasklog?plaintext=true&taskid=attempt_201001221644_0001_r_01_2&filter=stdout INFO mapred.JobClient: Task Id : attempt_201001221644_0001_r_01_2, Status : FAIL

Re: Map Reduce in heterogeneous environ..

2010-03-10 Thread abhishek sharma
What do you mean by heterogeneous environment? Do you mean servers with different number of cores? One set of the parameters in Hadoop determines the number of map and reduce slots for each TaskTracker. In a heterogeneous environment, presumably you would want to have more slots on servers with mo

Re: Map Reduce in heterogeneous environ..

2010-03-11 Thread abhishek sharma
> No. of slots per task tracker cannot be varied so even if some nodes > have additional cores, extra slots cannot be added. True. This is what I have been wishing for;-) I routinely use clusters where some machines have 8 while others have 4 cores. Abhishek > > Regards, > > Jayant > > > -Ori

how are the splits for map tasks computed?

2010-03-24 Thread abhishek sharma
Hi all, I have a job ("loadgen") with only 1 input (say) part-0 of size 1368654 bytes. So when I submit this job, I get the following output: INFO mapred.FileInputFormat: Total input paths to process : 1 However, in the JobTracker log, I see the following entry: Split info for job:job_201

posted again: how are the splits for map tasks computed?

2010-03-24 Thread abhishek sharma
I realized that I made a mistake in my earlier post. So here is the correct one. I have a job ("loadgen") with only 1 input (say) part-0 of size 1368654 bytes. So when I submit this job, I get the following output: INFO mapred.FileInputFormat: Total input paths to process : 1 However, in th

measuring the split reading time in Hadoop

2010-04-03 Thread abhishek sharma
Hi all, I wanted to measure the time it takes to read input split for a map task. For my cluster, I am interested in measuring the overhead of fetching the input to a map task over the network as opposed to reading from the local disk. Is there an easy way to instrument some function to log this

cluster under-utilization with Hadoop Fair Scheduler

2010-04-11 Thread abhishek sharma
Hi all, I have been using the Hadoop Fair Scheduler for some experiments on a 100 node cluster with 2 map slots per node (hence, a total of 200 map slots). In one of my experiments, all the map tasks finish within a heartbeat interval of 3 seconds. I noticed that the maximum number of concurrentl

Re: Hadoop's deafult FIFO scheduler

2010-10-14 Thread abhishek sharma
What is the inter-arrival time between these jobs? There is a "set up" phase for jobs before they are launched. It is possible that the order of jobs can change due to slightly different set up times. Apart from the number of blocks, it may matter "where" these blocks lie. Abhishek On Thu, Oct 1

Hadoop File system performance counters

2010-12-15 Thread abhishek sharma
Hi, What do the following two File Sytem counters associated with a job (and printed at the end of a job's execution) represent? FILE_BYTES_READ and FILE_BYTES_WRITTEN How are they different from the HDFS_BYTES_READ and HDFS_BYTES_WRITTEN? Thanks, Abhishek

Namenode not starting

2011-09-01 Thread abhishek sharma
Hi all, I am trying to install Hadoop (release 0.20.203) on a machine with CentOS. When I try to start HDFS, I get the following error. : Unrecognized option: -jvm : Could not create the Java virtual machine. Any idea what might be the problem? Thanks, Abhishek

Re: Namenode not starting

2011-09-01 Thread abhishek sharma
;Engineering, Beihang University > * Phone: (86-010)82315908 > * Email: hailong.yang1...@gmail.com > * Address: G413, New Main Building in Beihang University, > *              No.37 XueYuan Road,HaiDian District, > *              Beijing,P.R.China,100191 > *

Re: Namenode not starting

2011-09-01 Thread abhishek sharma
ht on, > Ravi > > On Thu, Sep 1, 2011 at 4:35 PM, abhishek sharma wrote: > >> Hi Hailong, >> >> I have installed JDK and set JAVA_HOME correctly (as far as I know). >> >> Output of java -version is: >> java version "1.6.0_04" >> Java(TM

Re: hadoop fair scheduler

2011-10-17 Thread abhishek sharma
Hi Shivam, The following paper by Zaharia et al. has design insights as well as lots of evaluation. http://www.cs.berkeley.edu/~matei/papers/2010/eurosys_delay_scheduling.pdf Abhishek On Mon, Oct 17, 2011 at 1:20 PM, Harsh J wrote: > Shivam, > > Here lies its inception with good reading stuff